Sections: Main page Security Kernel Distributions On the Desktop Development Commerce Linux in the news Announcements Linux History Letters All in one big page See also: last week's Kernel page. |
Kernel developmentThe current kernel release is still 2.4.7. The 2.4.8 prepatch is currently at 2.4.8pre3; it includes the usual collection of fixes, along with the single-use patch from Daniel Phillips which was covered last week. There have been complaints that the 2.4.8pre series is much slower on systems with large amounts of memory; the VM hackers are currently hot on the trail of those problems. Users of Adaptec adaptors (i.e. your editor, grumble grumble...) on SMP systems were unpleasantly surprised with 2.4.8pre2, which crashed on boot. The check that caused the crash has been removed, but there appears to be a strange problem that still lurks in there somewhere. Alan Cox's latest patch is 2.4.7ac3. It contains a great many architecture-specific changes; slowly the kernel trees for the various ports are finding their way back toward the mainline. There's also some enhancements for User-Mode Linux and many miscellaneous fixes. A new kernel API for completion events. It is common in kernel code to set some sort of process in motion, then to go to sleep and wait until that process completes. There are several ways of implementing the "wait for completion" part; which is the proper one to use depends on the specific situation. Until 2.4.7 came out, one technique used involved semaphores. The initiating process would declare a semaphore as a local variable (i.e. on the stack), starting out in the locked state; the process would do what was needed to arrange for some work to be done, then wait on the semaphore. The code actually doing the work would simply unlock the semaphore when the task was complete. On the surface, this technique is appealing because it avoids some obvious race conditions. If, for example, the work gets done before the kernel gets around to waiting on the semaphore, it notices that fact and simply doesn't wait. The sleep_on() and wake_up() calls can be much trickier to use correctly in this situation. But, as it turns out, there is a race condition here too, which is a result of how the semaphores themselves work. When a semaphore is to be unlocked, the code (1) sets the semaphore itself to the unlocked state, then (2) calls wake_up() to notify any processes that might have been waiting on the semaphore. If the waiter tests the semaphore between those two steps, it will never actually wait, and may well execute the rest of its code before the wake_up() call happens. That is not normally a problem, but, if the semaphore is sitting on a kernel stack somewhere, it could cease to exist before the wake_up() call, which requires data from the semaphore, runs. In other words, it could be working with a pointer into random memory; the technical term for this is "oops." This particular race is highly unlikely to ever actually happen, but it's still a race. The performance of this approach is also suboptimal, due to the fact that semaphores are optimized for the unlocked case. In this particular situation, the semaphore will almost always be locked. Linus chose not to change the semaphore implementation (it's "painful as hell"); instead, he created a new interface for the handling of completion events. All a process need do to use this facility is to create and initialize a completion structure: struct completion event; init_completion(&event);Then it can set things in motion, and call: wait_for_completion(&event);to sleep until things are done. The task actually doing the work can perform a simple call to complete(&event);and the waiting process wakes up. It's a relatively straightforward solution, even if changing APIs in the middle of a stable kernel series may look a little strange. If nothing else, the whole affair makes it clear, once again, just how hard it is to avoid race conditions in kernel code. The first initramfs patch was posted by Alexander Viro this week. This patch is the implementation of the new 2.5 boot process that was first discussed in the July 12 kernel page. In this scheme, the kernel executable image carries with it a cpio archive containing the contents of the initial root filesystem. That archive is loaded into a ramdisk at boot time, at which time it can be used to continue the system initialization process. The hope is to move much kernel initialization code out of kernel space and into this ramdisk. The result is a smaller kernel and more flexibility in how the bootstrap process is set up. For the moment, the tasks that have been moved to user space include:
Heading toward ext3 1.0. ext3 2.4-0.9.5 was released by Andrew Morton. This version continues the work toward a truly stable ext3 journaling filesystem release, fixing a number of bugs. Much work has also gone into performance improvements on a number of fronts. Among other things, synchronous operations happen more quickly; this should make people running large mail systems happy, since many mail transfer agents make heavy use of synchronous directory operations. Another change in 0.9.5 is the ability to use an external journal. External journals live on a separate device (perhaps a non-volatile RAM device), and, in theory, can speed up the operation of the filesystem. Writes to an external journal should be very quick, and journal operations will not contend with writes to the rest of the disk. The initial performance results with external journals appear to be mixed, however. Those interested in ext3 may also want to see an older patch announcement from Andrew which contains a detailed explanation of the three journaling modes supported. Much slower routing performance in 2.4 has been reported by some users. The common factor in these reports is that the people involved are still using the 2.2 ipchains interface to set up their firewalling. The ipchains module in 2.4 carries full connection tracking along with it; most people setting up ipchains rules probably do not need that feature. The solution is to switch to iptables. Other patches and updates released this week include:
Section Editor: Jonathan Corbet |
August 2, 2001
| ||