[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current development kernel release is 2.5.1. Linus's 2.5.2 prepatch is up to 2.5.2-pre6. This prepatch contains more block I/O work, of course, though that effort seems to be winding down - for now. So this prepatch includes a number of other things, including a merge of many of the fixes from the "dj" patch series, Al Viro's namespace patch (described in the March 1, 2001 LWN kernel page), some scheduler work from Davide Libenzi, a USB update that includes beginning support for USB 2.0, and a number of other things.

One of those "other things" is a 'new and anal' kdev_t type. kdev_t, the internal kernel representation for device numbers, has traditionally just been the user-space dev_t in disguise. It is now defined as a structure as a way of finding all kernel code which treats kdev_t as a simple number. Even proper code needs editing, however, since the macros which manipulate kdev_t have changed. As of -pre6, there is a lot of code which still needs work and which, thus, does not compile. The -pre6 prepatch is not for people who are not interested in tracking down these sorts of problems.

The current stable kernel release is 2.4.17, released on December 21. There was some grumbling that the final 2.4.17 patch included a couple of new fixes; Marcelo's policy seems to be that obvious, simple bug fixes can go in even after the last release candidate.

The first 2.4.18 prepatch came out on December 26; it is a large patch with a number of architecture updates.

Other prepatches: Dave Jones's current prepatch is at 2.5.1-dj10. It tracks the Linus prepatches through 2.5.2-pre5, and, thus, does not yet contain the kdev_t work.

Michael Cohen has concluded that the world still needs a 2.4-based development tree. So, he has released 2.4.17-mjc1 to fill that need. It starts with 2.4.17, of course, but then adds Rik van Riel's reverse mapping patch, the preemptible kernel patch, software suspend, Andre Hedrick's IDE work, and more. Despite all that, Michael claims "I'll try to keep this as close to the 2.4.x line as possible."

2.2 users may be interested in 2.2.21-pre2 from Alan Cox.

Scheduler tweaks. The debate on what changes should be made to the scheduler in 2.5 has not yet really happened. Even so, Linus has started merging in tweaks to the existing algorithm, in the form of Davide Libenzi's Time Slice Split Scheduler patch. This patch changes the way the scheduler handles the "dynamic priority" of processes; the result, hopefully, is fairer scheduling with lower overhead.

The Linux scheduler has traditionally handled dynamic priority via a task structure field called counter; the number stored in counter is, essentially, the number of clock ticks left in the process's time slice. By using this count as a priority adjustment, the kernel tries to divide the processor relatively equally among processes that need it; a process which has not managed to use up much of its time slice will be selected over another which has exhausted most of its time.

The new scheduler separates dynamic priority from time slice accounting by replacing counter with two new task structure fields: dyn_prio and time_slice. This change simplifies the time slice accounting in the kernel, and makes it easy to adjust the dynamic priority for other reasons. For example, a small priority boost can be given to a process which has just completed an I/O operation without increasing its time slice.

The new code has been steadily tweaked since its inclusion in the prepatch, mostly through adjustments to the time slice and dynamic priority settings. There have been few complaints, but also few posted benchmark results. And this patch does little to address the difficulties encountered by the current scheduler on SMP systems. Work with the scheduling algorithm is likely to continue for some time.

The kernel development process has been discussed from many angles over the last couple of weeks. Perhaps, at the end of a sometimes difficult year, developers need to ponder on how to make things better. Here's a few things that have come up:

  • Where is aio? Ben LaHaise first submitted his asynchronous I/O patches early last year. The AIO code enables user processes to queue up I/O operations directly from their buffers (i.e. without being copied through the kernel) without having to wait for their completion. AIO is a feature that Oracle has wanted for some time, as have other authors of high-performance applications.

    Discussion of the AIO patch on the kernel mailing list has been light, despite the fact that this patch makes deep and significant changes to how things have been done. Ben feels that part of the problem, at least, is the fact that these patches - or at least the part that reserves the AIO system calls - has not been merged into the mainline kernel. So there is no easy and stable platform for people to play with.

    Linus likes the AIO patch, but is not ready to merge it, or reserve system calls, until it has been more thoroughly discussed on the kernel mailing list. The result is a sort of "chicken and egg" standoff where AIO never really seems to move forward.

    One possible solution is this patch from Keith Owens, which makes it easy for kernel patches to use temporary system call numbers. System calls are registered at system boot (or module load) time, and they are exported to user space via a /proc interface. Properly written applications will be able to find the system calls they need, and they will continue to run properly even if those numbers change.

  • Units in the kernel. When somebody talks about "kilobytes," what unit are they really using? "Kilo" traditionally means 103 (1000), but, in the computing world, it often means 210 (1024) instead. A similar ambiguity exists for the "mega" prefix (106 or 220) as well. For the most part, people have lived with this fuzziness without trouble, but there are always those who feel that it's better to be exact.

    There is, in fact, a standard for the description of binary multiples. According to this standard, a "kilobyte" of memory is really a "kibibyte", and should be written "KiB". The standard also defines "mebi," "gibi," and so on. These definitions have been around since 1998, but their use has been minimal.

    When these units started showing up in the kernel's Configure.help file, some complaints started rolling in. Not everybody likes these units, to say the least. Eric Raymond, current keeper of Configure.help, has stated that he will continue to follow the published standards unless there is a clear consensus to the contrary. Clear consensus can be a scarce thing on the kernel mailing list, however, and no such consensus seems to have emerged on this issue.

  • Patch management. Low-level grumbling about patches being dropped by Linus (and others) has been a constant linux-kernel feature for a while. Patches sent to Linus often seem to just fall into the void; they are not applied, and no response comes back. Developers will often find that a patch finally goes in after having been submitted, without response, several times. It can be demoralizing for a hacker to be continually updating a patch to track the current kernel releases with no feedback as to whether it will eventually be included or not.

    One idea that occasionally comes up is the use of a patch management system. That was actually tried once, some years ago, but Linus has since stopped using the system. Among other things, says Linus, there is not much use in actually tracking patches over time. If they are not incorporated into the kernel, they go stale in a hurry and can no longer easily be applied. Linus, would rather that the job of merging patches with other developments stay with the originator of the patch. It also seems that Linus would rather work with people who will be persistent enough to maintain their patches until they are included, on the theory that these people will continue to maintain the code after inclusion.

The patch management issue, in particular, is likely to help drive the continuing success of the alternative kernel trees. Increasingly, one or more of these trees is likely to become a necessary staging area where patches can be tried out before finding their way into the mainline kernel. In fact, Linus says that the multiple trees are one of the strengths of the kernel development process for a number of reasons, one of which is patch management.

The Linux kernel is almost alone in its use of multiple trees as part of the development process. Many projects have stable and development branches, but few have multiple trees on either the stable or development side. It will be interesting to see if the multiple-tree idea proves useful enough to spread more widely in the free software development world.

The new kernel build implementation remains a topic of interest. Eric Raymond has sent out a the state of the new config and build system message stating that everything was ready to go whenever Linus is. Keith Owens, meanwhile, has released kbuild 1.12 for the 2.5 kernel. There remains one little problem, however: the new kbuild takes about twice as long to execute a full kernel build. Not surprisingly, the kernel developers are not entirely enthusiastic about this state of affairs. They wait on kernel builds every day and have little taste for a change that makes things far slower.

Keith's response to the complaints is essentially this: the new kbuild fixes a number of problems, especially with regard to handling of dependencies, that exist in the current kbuild system. Correctness comes first, with performance to follow. Keith believes that he can fix the performance problems, fairly quickly, but only after the kbuild code has been integrated into the kernel. Until then, he is busy enough just managing the patch and keeping it current with kernel releases.

Linus, you have a choice between a known broken build system and a clean and reliable system, which is slightly slower in mark 1. Please add kbuild 2.5 to the kernel, then I will have time to rewrite the core programs for speed. Mark 2 of the core code will be significantly faster.

There has been no word from Linus. In the view of many kernel developers, however, who have not generally had trouble with the existing build system, the new kbuild should not be merged until the performance problems have been dealt with. Keith has already made some steps in that direction with a new design for the management of the data used by the kbuild process.

Other patches and updates released this week include:

  • Rusty Russell has posted a new /proc/sys implementation which creates a completely new filesystem for single-value items.

  • Pavel Machek has posted a new software suspend patch for 2.5.1.

  • Jean Tourrilhes continues work on the new wireless driver API; the latest patch adds for support for wireless events.

  • kdb 2.0 for 2.4.17 was released by Keith Owens.

  • Dmitri Kassatkine has announced version 0.9pre7 of the affix BlueTooth stack. [hviz graph]

  • Jeff Dike has released version 0.54-2.4.17 of the User-mode Linux port. Jeff also states that he has sent a UML port off to Linus for inclusion in 2.5.

  • Arnaldo Carvalho de Melo has posted a tool which plots the dependencies between Linux kernel include files. An example of its output may be seen on the right.

  • Christoph Hellwig has released a DRM 4.0 patch for the 2.4.17 kernel.

  • A new high-resolution timers patch was announced by George Anzinger.

  • The Linux Kernel Source Finder is a new web page, maintained by David Alan Gilbert, containing pointers to the definitive archives for non-x86 Linux kernel sources.

  • Kernel Traffic #148 (December 31) is available.

  • James Bottomley has posted a patch adding support for the NCR voyager architecture.

  • Momchil Velikov has posted a patch improving the performance of the kernel page cache in a number of ways.

  • Rik van Riel has posted a 2.4.17 VM implementation with reverse mapping support. The announcement describes what is in the patch, but people who want to actually run the code should use version 10a instead.

  • A BeOS filesystem implementation for Linux is available from Will Dyson.

  • Andrew Cannon has a filesystem driver for the Radisys RBF filesystem.

  • devfs 199.6 (for 2.4.17) and devfs 205 (2.5.1) were released by Richard Gooch.

  • Zygo Blaxell has posted a CryptoAPI patch for 2.4.17.

  • CML2 1.9.20 has been released by Eric Raymond.

  • The third stable release of USAGI (the "UniverSAl playGround for Ipv6") was announced by Kanda Mitsuru.

Section Editor: Jonathan Corbet


January 3, 2002

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2002 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds