Sections:
Main page
Security
Kernel
Distributions
On the Desktop
Development
Commerce
Linux in the news
Announcements
Linux History
Letters
All in one big page

Kernel development

The current kernel release is still 2.4.5. The 2.4.6pre6 prepatch came out on June 27, just as this page was going to "press." Sometimes we think Linus does it on purpose. In any case, this prepatch contains another set of fixes, and the resumption of merging from the "ac" patch series (which is currently at 2.4.5ac19).

The 2.2.20 prepatch is up to 2.2.20pre6.

2.5 is coming soon. In a message on page locking, Linus let slip the following:

I don't have any objections to the patch in that case, although it does end up being a 2.5.x issue as far as I'm concerned (and don't worry, 2.5.x looks like it will open in a week or two, so we're not talking about long timeframes).

That message was posted on June 21, meaning that the new development series can be expected anytime. The 2.4 kernel has had an especially long settling period - a full six months if it lasts until July 4. The kernel hackers (and others) are more than ready to have a bleeding edge to live on once again. It's certainly time.

Transitioning to the new kbuild system. The approach of 2.5 has motivated the developers of the new kernel build subsystem to think about how they will effect the transition. The current plans, as last heard from Linus, call for that transition to happen somewhere around the 2.5.2 release. So it's not too soon to be wondering how it will happen.

A draft transition plan has been posted to the kbuild list. The developers have decided that the first step will be to replace the configuration system with CML2. That code is stable, and it appears that an important enhancement will not be implemented soon:

For you CML2 Adventure fans, Eric has decided not to implement monster combat at this time. On the other hand, the dungeon walls may soon develop graffiti.

The new Makefile scheme implemented by Keith Owens is just about ready, but there are a couple of loose ends yet to be taken care of. So the Makefiles will come second, as a large patch of their own. A third, cleanup patch will follow later. Of course, all this is subject to acceptance by Linus...

Memory management I: the early flush patch. Linux, like most Unix-like systems (and most systems in general), does not immediately flush data written by processes to the disk. Data, instead, is cached in order to improve performance. By delaying writes, the system can fold multiple operations into a single write to the disk. Performance can also be improved by allowing writes to consecutive disk sectors to accumulate, so that they can all happen at once.

In general, this approach works well. Recently, however, some developers have begun to question one aspect of write caching: how the system decides that it is time to write cached data to disk. Currently, the decision to write comes about in two ways: (1) the system needs memory for other purposes, or (2) the data has been sitting in memory for too long. Neither case is optimal, as it turns out.

Using memory for cache until the system needs it for something else seems like a sensible policy, and it often is. In the real world, however, memory pressure is often associated with a high disk I/O load. So if the system waits until memory is short to write cache to disk, it ends up increasing the load on memory just when it's already at a high point. The result can be even worse memory pressure, an overheating disk, and possibly a thrashing system.

Writing out cache on a regular basis (the second case, above) may or may not create I/O at a bad time. It can, however, create suboptimal behavior on laptops, or any other system where disks have been set to spin down after an idle period. Activity which generates data to write to disk has a good chance of having already caused the disk to spin up. If the actual write of the data is delayed up to 30 seconds, the spindown of the disk will be delayed accordingly.

Both cases suggest that it might often make sense to write out cached data quickly, especially if the disks are not doing much at the time. Daniel Phillips has put together a patch which attempts to do just that. Daniel's patches are always interesting to read, since he includes a detailed and clear description of what he is doing; this one is no exception.

Essentially, the patch sets up a new polling loop within the kernel which runs every 100ms. At each poll, if the I/O backlog is small, a flush of cached data will be initiated. That flush may not write out absolutely everything; it tries to fill up the I/O queues while still leaving some slack, in case a burst of activity comes along. The patch is relatively small and simple, but it has the potential of improving performance for a number of different types of workloads. And getting data written to disk sooner doesn't hurt either. (Those who want to try out the patch should see the updated version which contains a few improvements.)

Memory management II: the VM requirements document. Jason McMullan recently posted a rant (his word) on how work with the VM subsystem is going. According to him, people have been bashing on virtual memory without a strong idea of just what they are trying to accomplish. He would like to see a summary of the motivations behind the VM work.

What if the VM were your little Tuxigachi. A little critter that lived in your computer, handling all the memory, swap, and cache management. What would be the positive and negative feedback you'd give him to tell him how well he's doing VM?

The ensuing conversation remained calm, despite the fact that the VM hackers did not entirely agree with his summary of their work. Jason followed up a few days later with a draft VM requirements document analyzing the constraints on memory management for a number of system types, from embedded systems to servers. In particular, he looked at caching and swapping behavior. It boils down to a few rules of thumb, including:

Do not write to slow "packeted" devices until memory is needed for processes on the system. These devices include flash memory and laptop disk drives. The purposes here are to get the best performance out of the devices, to avoid excessive wear on flash memory, and to keep laptop drives spun down as long as possible.
Keep "packeted" devices idle for as long as possible. This is an extension of the previous point; laptop disks should be kept spun down until you really need to bring them up.
Never cache reads from very fast devices. On embedded systems with flash memory, for example, reads are almost immediate, and caching them is a waste of RAM.
Keep running processes as fully in memory as possible, thus avoiding swap traffic. Idle processes, instead, can be forced out to make room.

There were also a couple of points regarding cache size which were controversial and are likely to be revisited.

This sort of analysis, of course, is just a first step. Turning the above items into actual strategies for the VM subsystem, and from there into code, will take some time. But it is a useful exercise in the ongoing effort to improve Linux memory management. (See also: Rik van Riel's FREENIX paper on Linux memory management, available in PDF format from his lectures page).

Other patches and updates released this week include:

Eric Raymond has posted a proposed change to the kernel COPYING file which makes the tolerance of binary-only loadable modules explicit.
Alan Cox posted a document describing how Linux uses the PC BIOS.
The FOLK patch (covered last week) has grown to 27 distinct patches, including JFS and Andrew Morton's low-latency patches.
James Simmons has announced the creation of a secondary MIPS kernel tree, which is intended to function much like the "ac" kernel patches.
IBM has released JFS 0.3.6.
Andrew Morton has posted a set of ext3 patches for those wanting to use this filesystem with the 2.4 kernel.
Michael Kerrisk has created updated man pages for the clone() and wait() system calls.
Andries Brouwer has released util-linux-2.11g.
Jeff Garzik has proposed a reorganization of the network driver hierarchy for 2.5.
Jens Axboe posted a new high memory zero-bounce patch which eliminates, for now, the new DMA32 memory zone. (See also the update if you're considering trying out this patch).
Jari Ruusu has announced a new version of his file encryption module.
Amusement of the week: Rick Hohensee posted a message titled The Joy of Forking proposing a radical fork of the Linux kernel. This branch would, among other things, drop POSIX compatibility, forget about SMP, support only x86, and feature an in-kernel interpreter.

Section Editor: Jonathan Corbet

June 28, 2001

For other kernel news, see:
Kernel traffic
Kernel Newsflash
Kernel Trap
2.5 Status
Other resources:
L-K mailing list FAQ
Linux-MM
Linux Scalability Effort
Kernel Newbies
Linux Device Drivers

Next: Distributions