[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is 2.4.3. Linus has released 2.4.4pre2, which contains another set of fixes (including some of the bug fixes described below). Alan Cox, meanwhile, is up to 2.4.3ac4. While that patch is billed as containing mostly architecture-specific fixes, it also includes a merge of the user-mode Linux port (which was covered on the February 15 LWN kernel page).

Nailing down the bugs. This week saw significant progress toward finding and fixing the remaining serious bugs in the 2.4 kernel.

  • The elusive problem that would cause processes to hang in an uninterruptible ("D") state turns out to have been caused by a bug in the reader/writer semaphore implementation. These semaphores had not been much used until recently, so the bug, which has been present for a long time, had not caused any trouble. Andrew Morton, after pounding on the problem for a while, finally gave up and wrote a completely new implementation which fixes the problem - at the cost of breaking a fair amount of code. It also turns out to be hard to implement the new scheme on old 386 processors, which lead to a long discussion of just how well 386 systems should be supported at this stage. It looks like it will be possible to make 386's work reasonably well, though, in the end.

  • The "filesystem corruption under high load" bug was, after great effort, nailed down by Ingo Molnar and others at Red Hat. There is a rare case in the ext2 filesystem where it can drop a block that is still in use; it was introduced in 2.4.0-test6. A patch is out which fixes the problem.

  • One other D-state bug in the logical volume manager code was fixed by Jens Axboe.

  • Jonathan Morton has posted a patch with a number of virtual memory fixes - including one which fixes the problem where the out-of-memory process killer would be invoked too soon.

Those two fixes should show up an a 2.4.4 prepatch shortly, though the semaphore fix may take a little while to stabilize.

There appear to some outstanding issues with the aic7xxx SCSI adapter driver, though many of them seem to be the result of incorrectly applied patches.

No more jiffies? An interesting discussion (and patch) came up this week which could lead to a very different timekeeping technique in the kernel. We'll start with a little background...

The kernel currently handles most of its timekeeping tasks by means of the timer interrupt. It's a hardware interrupt driven by the clocks that all modern systems have; on most architectures the clock is programmed to deliver this interrupt 100 times per second (but the Alpha and IA-64 run at 1024). The clock interrupt handler does a number of things, including seeing if the current process has used its allotted CPU time, running any deferred tasks whose time has come, updating process accounting, and incrementing a little variable called jiffies. The jiffies counter is, among other things, a measure of the uptime of the system; it is used for many timing-related tasks within the kernel.

The timer tick system has been seen as imperfect for a while. Among other things, it imposes a 10ms resolution on most timing-related activities, which can make it hard for user-space programs that need a tighter control over time. It also guarantees that process accounting will be inaccurate. Over the course of one 10ms jiffy, several processes might have run, but the one actually on the CPU when the timer interrupt happens gets charged for the entire interval.

A new problem came up, however, over at IBM. On their S/390 mainframes, they can run a great many independent "Linux images," each of which is a full, independent kernel. With its own timer interrupt. As Martin Schwidefsky pointed out in his posting on the subject, with 1000 images running, the timer interrupt overhead gets to be significant - up to 100% of the available CPU. That, of course, is not the sort of mainframe performance that IBM had in mind, so they had to make some changes. Those changes, essentially, were to eliminate both the timer tick and the jiffies variable.

The timer tick can go away because the kernel does, in general, know when something will next need its attention. There's a handy, sorted list of upcoming timer events, and the kernel knows how long the current process should be allowed to run before being scheduled out. So, the system's interval timer can be set to exactly the right time when something needs to happen. This timer can, simultaneously, be set with much higher resolution and to a much longer interval than the regular clock tick.

Eliminating jiffies is a little tricky, since a great deal of code makes use of it. A quick grep for jiffies the 2.4.3 source turns up over 3700 references. The variable needs to go, since there isn't a nice, regular clock tick to keep it updated. But fixing that many places in the source just does not sound like a whole lot of fun. For those of you who are into the details, the IBM S/390 fix looks like:

  #define jiffies ({ \
          uint64_t __ticks; \
          asm ("STCK %0" : "=m" (__ticks) ); \
          __ticks = (__ticks - init_timer_cc) >> 12; \
          do_div(__ticks, (1000000/HZ)); \
          ((unsigned long) __ticks); \
  })
Essentially, every reference to jiffies gets turned into a read of the real-time clock. Since every access to jiffies (except one) is a read, this technique works - for the IBM architecture, which has relatively new and clean code.

This approach fixes almost all of the problems with the old scheme. The regular timer interrupts, along with their overhead, is gone. The timer on most systems can be programmed with great precision, meaning that very high resolution timers can be supported. That will make certain types of processes (MIDI sequencers, software modem drivers, high-speed video, etc.) run far better. And process accounting, done when the process reschedules, will be extremely accurate.

The change is not without its costs, though. The code changes are significant, of course, meaning that this change is a 2.5 item. A certain amount of extra overhead will need to be added to system calls to keep everything updated in the absence of a timer tick. If not done carefully, this overhead could outweigh the savings on normal systems (which do not run 1000 independent Linux images...). There is also some overhead added to the scheduler.

In fact, George Anzinger, one of the developers behind the high resolution timers project, posted a message stating that the project had decided to avoid the no-tick approach due to the cost of that extra overhead. They seem willing to reconsider, though. The advantages of this approach seem to be strong; we may well see it adopted in the 2.5 development series.

CML2 1.0 released. Eric Raymond has announced the 1.0 release of CML2, the new kernel configuration system. The announcement talks about the plans for integrating CML2 into the 2.5 development series, and provides a lengthy discussion on why CML2 is better. (See also last week's LWN kernel page for a discussion of the new kernel build system as a whole).

Kernel summit webcast available. As LWN readers are probably tired of hearing, the Linux 2.5 Kernel Summit was held on March 30 and 31. The presentations at the summit were videotaped, and they are now available in RealPlayer format from the OSDN web site.

Toward a security module interface. One of the conclusions that came out of the Kernel Summit was that the various groups working on security enhancements to Linux should agree on a standard interface. In that way, the projects could interoperate, and it would be easy to switch from one approach to another. To that end, Crispin Cowan has announced the creation of the "security module" mailing list. The purpose of the list is to explore the enhancement of the kernel module interface to support the development of pluggable security modules. Those who are interested in the topic are encouraged to sign up; subscription instructions are in the announcement.

Other patches and updates released this week include:

  • Romain Dolbeau has posted a framebuffer driver for the Permedia3 chipset.

  • Jeff Dike has released a version of user-mode Linux for the 2.4.3 kernel.

  • Daniel Phillips has posted a document describing the on-disk format of his ext2 directory indexes.

  • Maneesh Soni posted a patch which improves the performance of file descriptor management on SMP systems.

  • Justin Gibbs released version 6.1.11 of the aic7xxx driver.

  • Andre Hedrick posted an IDE driver patch which provides support for the Promise Ultra100 TX2 chipset.

  • LVM 0.9.1 was released by Heinz J. Mauelshagen.

  • Version 0.1.1 of the device registry patch (which provides a database of all physical devices on the system) has been posted by Tim Jansen.

Section Editor: Jonathan Corbet


April 12, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds