Sections:
Main page
Security
Kernel
Distributions
Development
Commerce
Linux in the news
Announcements
Letters
All in one big page

Kernel development

The current development kernel is 2.5.10, which was released on April 24. As per Linus's new style of operation (see below), this patch is relatively small, and was not preceded by any prepatches. It consists mostly of driver updates and a couple of fixes for 2.5.9 problems.

2.5.9, also released without prepatches, contained quite a few architecture updates, ongoing USB work, the usual IDE and VFS updates, and a new interrupt balancing scheme.

The current prepatch from Dave Jones is 2.5.9-dj1; it adds more fixes and a SCSI subsystem change that is likely to break a number of drivers.

The current stable kernel release is 2.4.18. Marcelo has released no 2.4.19 prepatches in the last week.

Alan Cox's latest prepatch is 2.4.19-pre7-ac2, which contains a bunch of I2O work and numerous fixes.

There will be a kernel developers' summit held in Ottawa, just before the Ottawa Linux Symposium. Like last year's event, this summit will be an invitation-only affair. No agenda has yet been released.

Smarter interrupt balancing is now part of the 2.5 kernel - at least, for the x86 architecture. Modern interrupt controllers have long had the ability to direct interrupts to specific processors on SMP systems. Thus far, Linux has made relatively little use of that capability. 2.5.9, however, included a small patch by Ingo Molnar which changes things.

At most once every "jiffy" (1/100 of a second on the x86), the interrupt management code will attempt to balance each interrupt that it handles. This code will now select a target processor by scanning in a random direction for a CPU that is "idle enough" - one which has been idle for at least one clock cycle. In the absence of an idle processor, the code will most likely not change the processor handling the interrupt.

The changes make sense. In general, it is better to have the same processor deal with any specific interrupt, in order to take advantage of data in the processor cache. But, as the scheduler gets better at keeping processes from moving between processors (again, for cache reasons), it is a good idea to direct other work away from busy processors. The performance benefits from balancing interrupts in this manner are probably not huge, but every bit helps.

What do you call a USB "device" - a computer (such as a PDA) which attaches to a USB bus as a device, rather than as a host computer? The standards use the term "device," but, as discussed here over the last few weeks, Linus (along with others) is not comfortable with that term. A USB "device driver" is commonly understood to be something that runs on a host computer, after all.

Terms like "target," "slave," and "client" have been thrown around. The leading contender now, however, may well be "gadget." It may seem relatively non-technical, but it gets the idea across. Don't be surprised if the kernel acquires a set of gadget drivers in the near future.

On the proper splitting of block I/O operations. The 2.5 development series has seen a great deal of work on the block I/O subsystem. One of the goals of that work has been to address a performance problem found in 2.4 (and prior) kernels: all block I/O transfers were split into very small blocks. An application (or filesystem) may write large chunks of data, but the block I/O code would split those large transfers into single blocks before passing the request (now multiple requests) on to the driver. The driver can join those chunks back together, but the "lots of small blocks" nature of the 2.4 block subsystem remains a drag on performance.

So one of the first things that was done in 2.5 was to increase the smartness of the block code, having it pass large requests through to the low-level drivers intact. It turns out, however, that this approach is not entirely without its problems either.

Consider the challenge faced by the EVMS project, which is building a fancy volume management scheme. An EVMS volume looks like a disk, and can receive large requests from the block I/O layer. Internally, however, that request may have to be handled with operations involving multiple drives. Thus, the lower layers may have to split up the I/O requests that the upper layers have so carefully kept intact.

The EVMS folks have run into some practical difficulties in handling this splitting. There are, in fact, some serious traps to avoid in performing this sort of operation. Splitting a block I/O request can require memory - but what happens when the system is out of memory, and the I/O request was generated in order to free pages? That sort of scenario can lead to deadlocks, grumpy users, and further declines in Linux stock prices.

So how does one deal with requests that need splitting? A few possibilities have been raised:

Keep aside a private pool of memory for the splitting of block I/O requests. EVMS has an implementation of a private pool which works now, but this approach is seen as a wasteful duplication of code. It can also be hard to guarantee that sufficient memory will be available when it is needed.
Have each device (physical or virtual) record a maximum I/O size that it can handle. This maximum could be set to the largest size which does not require splitting of requests, and the problem goes away. The new problem, of course, is that this approach looks much like the 2.4 scheme that Jens Axboe and others worked so hard to eliminate.
Provide a callback into the low-level drivers whereby the block I/O layer could ask how large each request should be. Given information about which blocks are to be transferred, the low-level driver could calculate exactly how large the request could be before it would have to be split. This technique would produce optimal request sizes, but at a cost of increasing the amount of computation for every block I/O operation. This cost would be a complete loss most of the time, since most block devices do not have variable maximum request sizes.

No generally-accepted solution has emerged as of this writing.

The rest of the BitKeeper story. This week's Front Page looks at the latest BitKeeper debate as a disagreement over BitKeeper's non-free license. It turns out, though, that licensing is not the full story; there is some concern about how patches are getting into the mainline tree, and how BitKeeper may be affecting the development process.

Consider another posting from Daniel Phillips:

Those who now chose to carry out their development using the patch+email method, and prefer to submit everything for discussion on lkml before it gets included are now largely out of the loop. Things just seem to *appear* in the tree now, without much fanfare. That's my impression.
Rather than Linux development becoming more open, as I'd hoped with the advent of Bitkeeper, it seems to be turning more in the direction of becoming a closed club.

Daniel's fear, thus, is that BitKeeper is helping to reduce the openness of kernel development by providing a sort of back channel through which many patches now pass. Not everybody agrees with that assessment, naturally. Linus, for example, states: "I'm not getting changes from any new magical BK 'men in black'."

Linus goes on to recognize, however, that at least some people feel put off by the new process. One idea that he has come up with is to have BitKeeper generate daily development kernel releases so that everybody could easily track what has been merged. That has not happened yet, but Linus has decided to do away with the -pre prepatches for development kernels, and to make regular releases more frequently. Thus, 2.5.9 and 2.5.10 came out relatively quickly, and without prepatches. If Linus sticks with this approach, kernel development will look more like it did back in the early days.

(Meanwhile, regular dumps of patches from BitKeeper are being posted by both David Woodhouse and Rik van Riel. Larry McVoy has posted statistics on 2.5 changes in BitKeeper by developer and by directory.)

Other patches and updates released this week include:

Kernel hackers wanted: