[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 Development
 Commerce
 Linux in the news
 Announcements
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current development kernel release is 2.5.6, which was released on March 8. The final release added little to the prepatches; the main feature of this release from a user's point of view remains the inclusion of IBM's JFS journaling filesystem.

The first 2.5.7 prepatch has been released. It includes Rusty Russell's fast user-space semaphore patch ("futexes"), a thrashup of the VLAN code, the new wireless driver API, a redesigned video device implementation, and numerous fixes and updates.

Dave Jones has released no "dj" patches over the last week. He has presented excuses like moving into a new house as a reason for that.

Guillaume Boissiere's latest 2.5 status summary is available.

The current stable kernel release is 2.4.18. The current 2.4.19 prepatch from Marcelo is 2.4.19-pre3. Along with the usual array of fixes and updates it includes the "new" IDE code - in its original form, not the increasingly reworked version found in the 2.5 kernel. In fact, the -pre3 version is missing some important fixes that went into 2.5 early on - it still has the bug that caused 2.5 to destroy filesystems. There have been no reports of corrupted filesystems with this prepatch, but it should be approached with some care anyway.

Alan Cox's latest prepatch is 2.4.19-pre2-ac4. There is a long list of fixes, but no amazing new features.

Alan has also announced the first 2.2.21 release candidate.

Other kernel trees. The day may yet come when the number of available kernel trees exceeds the number of Linux users...

  • Andrea Arcangeli's latest is 2.4.19-pre3-aa1. It adds his latest VM implementation (vm-31), the X86-64 port, User-mode Linux, and a number of fixes.

  • J.A. Magallon has released 2.4.19-pre2-jam3 with the latest VM code, the O(1) scheduler, the IDE patch, and other performance-oriented fixes.

  • Jörg Prante has released 2.4.19-pre2-jp7 includes ALSA, the reverse mapping VM, the O(1) scheduler, the preempt patch, the IDE patch, XFS, JFS, various crypto patches, and much more.

  • 2.4.19-pre2-ac4-xfs-shawn10 from Shawn Starr includes XFS, the reverse mapping VM, Jan Kara's reworked quota system, and more.

  • A new entry this week is 2.4.18-mcp3-WOLK from Marc-Christian Petersen, which is inspired by the FOLK patch. It throws in Win4Lin, the preempt patch, the international crypto patch, the IDE patch, JFS, XFS, FreeS/WAN, NWFS, lm_sensors, and a great many other patches.

Linus on BitKeeper. It was already clear, of course, that Linus is not bothered by the BitKeeper license. For anybody who didn't know that, however, he stated his views this week:

And I personally refuse to use inferior tools because of ideology. In fact, I will go as far as saying that making excuses for bad tools due to ideology is _stupid_, and people who do that think with their gonads, not their brains.

Most of the developers seem to be at ease with his position. It is worth pondering, however, on why so many of us insisted on using Linux systems in the early 90's, when it was still clearly inferior to the numerous proprietary Unix systems that were available at the time. Without a certain amount of "gonad thinking," Linux might not have come so far so quickly.

Meanwhile, there has been a small discussion of what features are offered by BitKeeper that really make it worthwhile for the kernel developers. Here's a partial list:

  • Much nicer merging of patches. The three-way merge tool (screenshot) is seriously slick. But the ability to carry merges forward through multiple patch sets is just as important. Merging of patches can be a painful task; having to only do it once can be a real relief.

  • The ability to check in entire patch sets as a single operation.

  • The distributed repository feature is a key to the whole thing. BitKeeper works well with the kernel development style by allowing each developer to set up independent trees and facilitating the movement of patches between those trees.

  • Understanding of directories and operations like renaming; CVS does not handle these well at all.

There are developers out there who are talking about adding these features to the existing free source management systems. It's a nontrivial task, however; the first release is likely to be some time in the future. (Then again, Hans Reiser wants to incorporate version control into the filesystem, and plans to do so with a future ReiserFS release. "Version control has to become just another expected filesystem feature, and one that is so transparent to users that Mom uses it without fear.")

The hostile takeover of the 2.5 IDE code is now officially complete: Martin Dalecki's IDE 18 patch changed the MAINTAINERS file to list him as the person in charge of that subsystem. There were no immediate complaints, but things heated up a bit when he released IDE 19. Therein were comments like:

Apply Pavels Macheks patch for suspend support. Whatever some persons argue that it's not fully implemented, I think that we are in development series right now. I don't buy the mock-up examples for problems with either outdated or broken hardware. Micro Drives are for example expected to be drop in replacements for CF cards in digital cameras and I would rather expect them to be very tolerant about the driver in front of them.

Martin has also been heard to say: "Breakage is the price you have to pay for advancements."

It turns out that some kernel developers are not entirely pleased with the idea of "breakage" in the IDE code - they like their disks to work. There is a feeling that it is better to follow the standards than to expect drives "to be very tolerant about the driver in front of them." Few people have come out in defense of the existing code, but some feel that the current approach to "cleaning up" the IDE code is negligent to the point of carelessness.

The discussion, in fact, involved some of the most unpleasant personal attacks seen on linux-kernel for some time. It also appears to have changed little; Martin continues to crank out IDE patches, and Linus continues to accept them. Perhaps Martin has received a message, however, that standards compliance and stability are important. When it comes to disks, people are not willing to pay for their advancements with any great amount of breakage.

On the future of IDE taskfile commands. The IDE taskfile ioctl (which allows passing arbitrary low-level commands to IDE peripherals) has generally been the source of no end of inflammatory discussions in its own right. Compared to the other IDE threads, however, the current taskfile discussion seems like a new height of civility and technical content.

The issue is not whether low-level commands should be allowed - there is widespread agreement that this capability is occasionally required. Diagnostic code needs it, if nothing else. But when Andre Hedrick first implemented the taskfile capability, he included an IDE command parser to ensure that all commands passed to the drives were legal according to the standards. There never has been a consensus on whether this sort of command filtering is appropriate.

Those in favor of filtering point out that the consequences of executing a malformed IDE command can be severe: loss of data or, in the worst case, having to throw away a brick that was once a working drive. Filtering can thus protect against both programming errors and deliberate attacks. Proponents of filtering also see it as a possible way of defeating future "digital rights management" schemes which may depend on new, undocumented IDE commands.

The opposition points out that most drives have some unique, vendor-specific commands. Unless somebody wants to build (and maintain) a table of all such commands, any filtering is certain to block legitimate commands for some users. The protection against attacks is seen as being weak at best, since a process which is able to execute taskfile commands can also just go and pound on the I/O ports directly. And dealing with DRM schemes is probably not going to be so simple.

For all these reasons, Linus has generally been against IDE command filtering. He also points out that the IDE layer should not be performing any such filtering in any case. The IDE layer, after all, is a driver for the IDE host controller; the commands to be filtered are, instead, aimed at IDE disks. Linus compares IDE filtering to having a network adapter driver perform validity testing and filtering for network protocols.

There are some things that need to be done with low-level commands, however. At a minimum, the buffers they use must be verified. But it would also be a very good idea to better sequence their execution with all of the other IDE commands that may be running at the same time.

So Linus has proposed a new scheme for the handling (and possible filtering) of low-level IDE commands. These commands would be moved out of the IDE driver, into a separate loadable module. Paranoid administrators who do not want those commands executed at all could simply remove the module from their systems entirely. The rest could configure a module which did as much (or little) filtering as they wanted.

This module would not talk directly with the IDE subsystem. Instead, any low-level commands would be run through the drive's request queue along with all the other drive operations. This scheme forces low-level commands to be sequenced along with any other disk activity, and should help ensure that they are executed in a way that doesn't interfere with the other things the system is trying to do.

There have been very few complaints about this proposal. It's implementation would be some work, but there may just be a solution to the problem of the taskfile commands and filtering in sight.

Going for the fastest kernel compile. Martin Bligh posted an interesting note this week. He started with the 2.4.18 kernel and a 16-node NUMA system using 700MHz P3 processors. With that system, he was able to build a kernel in 47 seconds, which would make most of us reasonably happy. Martin wasn't satisfied with that, though, so he applied a series of patches to bring that time down:

  • Various NUMA memory allocation fixes: 27 seconds.
  • The O(1) scheduler from 2.5: 25 seconds.
  • A NUMA-oriented scheduler patch: 24 seconds.
  • A dcache patch which improves cache behavior: 23 seconds.

Compiling a kernel in 23 seconds isn't bad - it looks like a record.

Records, though, are meant to be broken. So Anton Blanchard rose to the challenge with a 24-node "logical partition" on a PowerPC64 system running a patched version of 2.5.6. Building a kernel with the same configuration as Martin's, above, he got the job done in 10.3 seconds. That will be a hard performance to beat, but somebody, somewhere, is certainly working on it.

Other patches and updates released this week include:

Core kernel code:

  • Robert Love has posted a new version of his system call allowing processes to set their processor affinity.

  • A new version of the delayed allocation patch has been posted by Andrew Morton. He might just be looking for people to try it out: "Does anyone know what 'CFT' means? It means 'call for testers'. It doesn't mean 'woo-hoo, it'll be neat when that's merged <delete>'. It means 'help, help - there's no point in just one guy testing this'."

  • Larry Kessler has released an implementation of POSIX event logging for the 2.5.6 and 2.4.18 kernels.

  • Rik van Riel has released a kernel with the reverse mapping VM in RPM format.

  • Erich Focht has posted a new version of his NUMA scheduler.

Development tools:

  • The Linux Test Project ltp-20020307 release is available. Numerous new tests have been added.

  • Keith Owens has released kdb 2.1-2.4.18 for the Sparc64 architecture.

Device drivers

  • The seventh test release of the new Tigon3 driver has been announced by David Miller.

  • A new beta Conexant HCF "linmodem" driver has been announced by Marc Boucher.

Filesystems and related:

  • Kevin Corry has announced version 0.9.2 of the Enterprise Volume Management System.

  • A new, vastly reworked disk quota system has been posted by Jan Kara.

  • Steve Best has announced the release of JFS 1.0.16.

  • Andreas Gruenbacher has released version 0.8.20 of the access control list patch.

Miscellaneous:

  • Rusty Russell has posted a fast userspace read/write lock ("furwock") implementation based on futexes. He has also posted an explanation of how futexes work.

Networking:

  • This week's release of the Affix BlueTooth stack is version 0_94.

  • Alexander Viro has posted an implementation of the "nfsd" filesystem - a new way of communicating with the NFS server process to perform tasks like exporting filesystems.

Ports:

  • James Bottomley has posted a new version of his port to the NCR Voyager architecture.

Section Editor: Jonathan Corbet


March 14, 2002