Sections:
Main page
Security
Kernel
Distributions
On the Desktop
Development
Commerce
Linux in the news
Announcements
Linux History
Letters
All in one big page

Kernel development

The current kernel release is 2.4.3, which was released during the kernel summit. Here's the changelog describing the fixes that have gone into this version; among other things, it includes the much-awaited loopback mount fix. Alan Cox remains busy, and has released 2.4.3ac3 containing his additional fixes.

The 2.2.19 release notes are also now available.

How stable is 2.4, anyway? Alan Cox's brief response to a frustrated 2.4 user was a bit worrying:

Then install 2.2.19. 2.4.x isnt stable yet. If you have the time then oopses and debugging data are wonderful if not then 2.2 is stable.

The 2.4.3 release, in particular, has drawn a few complaints. Most have to do with processes locking up, a problem which is still unsolved as of this writing. Is 2.4 ready for prime time?

For most people, of course, it is working just fine. There do appear to be some problems, however, for those who run systems under very high load. Alan has detailed some of those problems in his diary (scroll to April 3 if necessary). Some aic7xxx SCSI driver users have also run into surprises - your editor encountered a few with the 2.4.3 prepatches, though the final 2.4.3 seems OK. The SCSI scanning order has changed, though, for some users with multiple adaptors, leading to an unwelcome renaming of the SCSI devices on the system.

The 2.4.0 kernel was about as stable as it could have been, really. The last set of problems takes a wider community of users to find; that's what "dot-zero" releases are for. Every stable kernel series has taken a few releases to truly stabilize, and 2.4 is no exception. Some rough edges remain, but it's getting there.

Fixing the scheduler. It all started with a posting from Fabio Riccardi regarding some performance problems he has been having. It seems he's working on some improvements to Apache, and he has found that system performance drops badly when he starts running over 1000 processes. Wouldn't it be nice if things could work a little better?

The nature of the problem is quite well known. The Linux kernel maintains a single queue of all the processes that would like to be running in the CPU at any time. Of course, only one of those processes can actually be running - at least, only one per installed processor. So the scheduler must occasionally decide who actually gets to go. When the time comes to choose the new lucky winner, the entire run queue must be scanned to determine which process is the most deserving. If the queue is quite long, that scan can take quite some time. To make matters worse, the global runqueue_lock is held during this scan; that means that if another processor on the system is also trying to run the scheduler, it will "spin" in a busy loop waiting for the scan to complete (so it can start the scan itself).

The kernel developers are, for the most part, not tremendously interested in fixing this problem; see Ingo Molnar's post on the subject for a representative viewpoint. The reasons for not "fixing" the scheduler include:

An application which needs to run hundreds or thousands of competing processes is considered broken by design. It will never run particularly efficiently due to scheduling overhead, even in a scheduler which is optimized for this case. And, even without scheduler overhead, the cache performance of such an application will be poor.
Changes to the scheduler which optimize the many-processes case will almost certainly make the "small number of processes" case worse. The small case, of course, is far more common. A constant theme in kernel development has been that it is wrong to optimize rare situations at the expense of everyday use. As long as proposed scheduler changes cause reduced performance for normal use, they will not make it into the kernel.
The current scheduler code is general and flexible. Most proposed changes risk reducing that generality, making it harder to adjust in the future.

That said, there are a couple of scheduler-related projects out there. The Linux Scalability Effort is looking at the scalability of the kernel in general, with schedulers being one aspect of their work. This project has a couple of scheduler patches available, written by Mike Kravetz and Hubertus Franke at IBM's Linux Technology Center. One implements a priority-ordered run queue; the other splits the current run queue into multiple queues, one for each CPU in the system. The multiple queues reduce cross-CPU contention, and reduce the number of processes that must be scanned on any one CPU.

The Linux plug-in scheduler page at HP offers a patch that allows new scheduling policies to be added to the system via loadable modules. There is also a multiqueue scheduler available there.

Mr. Riccardi ran some tests with the HP multiqueue patch, and found a significant performance increase for his application. At the same time, he ran some two-process tests and noted no performance degradation at all. These results would seem to warrant another look at alternative schedulers. Perhaps 2.5 will see some activity in that area.

The new kbuild system, was, of course, a major topic of conversation at the Kernel Summit in San Jose. Our coverage of that event talked about the motivations behind this work and some of the issues involved, but didn't look that hard at the new code itself. That turns out to have been fortuitous; we're not going to have to do that research because Keith Owens nicely posted all the relevant information for us.

There are two major components to the new system. One is CML2, the new configuration metalanguage designed and implemented by Eric Raymond; LWN has been tracking CML2 for a while, and took a more detailed look at it back in November, so we'll not get into the details of it here.

The other aspect is the actual building system itself - the makefiles and associated support code. The 2.5 version of kbuild introduces some massive changes in this area that are worth a look.

With the 2.4 kernel, each subdirectory has a relatively small makefile defining the objects to be built. It relies on a large and complicated global Rules.make file for most of the build instructions; it is also highly dependent on the command line options passed to the make command from the global makefile. It can thus be hard to tell what the makefile will do by looking at it; the best approach is often the empirical one: run a big, global make and see what happens.

2.5 makefiles should be clearer, thanks to the new mechanism. The global Rules.make file is gone; instead, each directory's makefile will be able to stand alone. Makefiles will be automatically generated from each directory's Makefile.in file, which is written in a special preprocessor language created by the kbuild team. So, for example, if the object file foo.o should be built only if the CONFIG_BAR configuration option has been selected, a single line in Makefile.in:

	select(CONFIG_BAR foo.o)

would make it so.

The new kbuild system also gives a lot of flexibility to just how the kernel image is built. It is now possible, for example, to build with a read-only source tree, putting the object files in a different directory tree. In this way, different kernels can be built by using different configuration options against the same source tree. Of course the standard "build it in place" mechanism is still supported as the default way of doing things.

Perhaps even nicer is the "shadow tree" concept. It's not all that uncommon for people to apply patches to a stock kernel before building it. These patches can include add-on components (the kdb debugger, ALSA, etc.), external tweaks (your editor uses one which enables S/PDIF I/O on SB Live! cards), or one's own patches. Building with external patches means applying each patch to each kernel that you build; the resulting work can make it very hard to, say, keep up with Alan Cox as he cranks out "ac" releases.

With shadow trees, you keep your patches in one or more separate directory trees, away from the standard kernel source. The build process will then magically merge them all together before compilation. When a new kernel comes along, it can just be built against the same shadow trees. Of course, the build process can't ensure that the patches actually work with a new kernel, but they can take a lot of the work out of trying.

So when will all this stuff be available? Keith and company worked out a schedule with Linus at the summit. The 2.5.0 and 2.5.1 kernels will still have the old kbuild (as will 2.4, perhaps forever); Linus will use those releases to get the new series started and to launch a surprise or two of his own. If the plan holds, the 2.5.2 release will be dedicated to switching over to kbuild 2.5 and CML2.

Among other things, Linus has apparently said that he has no problem with CML2's Python implementation. That decision means that CML2 can go in as-is, but that there will likely be another round of grumbling among those who don't wish to install Python on their systems.

The kbuild developers, of course, would be happiest if the new system works flawlessly when 2.5.2 (or whatever) is released. So Keith has released the latest version for those who are interested in testing it, or, even better, dealing with some of the remaining issues. The announcement doesn't say where to actually get the new kbuild code, though: it's on the kbuild SourceForge site.

Other patches and updates released this week include:

Al Viro has posted a new version of his namespaces patch, which is, he says, "pretty close to final."
Karim Yaghmour has released version 0.9.5pre1 of the Linux Trace Toolkit.
JFS 0.2.2 has been released by IBM.
The Linux USB project has put out a press release describing the USB implementation in the 2.4 kernel.
Bharata B. Rao has posted an RFC and associated patch for a mechanism providing global allocation and control of the system's debug registers. Its primary purpose is to allow different kernel debugging mechanisms to play well together.
La Monte Yarrow has posted his notes from the Kernel Summit networking BOF.

Section Editor: Jonathan Corbet

April 5, 2001

For other kernel news, see:
Kernel traffic
Kernel Newsflash
Kernel Trap
2.5 Status
Other resources:
L-K mailing list FAQ
Linux-MM
Linux Scalability Effort
Kernel Newbies
Linux Device Drivers

Next: Distributions