[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is still 2.4.2. The current prepatch is 2.4.3pre6, released early in the morning on March 21. The patch log file is, as of this writing, only updated to 2.4.3pre5, however.

No 2.2.19 prepatches have been released this week.

Changing the memory map semaphore. One of the changes that is now in the 2.4.3 prepatch is a new memory map locking scheme implemented by Rik van Riel. The memory map semaphore controls access to the various virtual memory areas and page tables used by a process; it is intended to keep concurrent activities, such as page faults, memory map changes, and informational queries from stepping on each other. It is a fundamental part of how the virtual memory system works.

It also, seemingly, is a performance problem. For example programs that use the /proc interface to get process information can find themselves blocked for long periods of time. Page faults, too, can be slowed down, even when they occur in different places and should not conflict with each other. Multi-threaded programs, such as the MySQL server or Apache 2.0, are restricted to handling just one page fault at a time across the whole set of threads. In some cases, this restriction can lead to very poor performance.

Rik's change is to turn the memory map semaphore into a variant known as a reader-writer semaphore (or R/W semaphore). These semaphores allow multiple threads to access a common data structure simultaneously, as long as none of them make any changes. Once somebody needs to change things, it must wait until all of the readers have finished their business, then lock them out for the duration of the change.

An R/W semaphore suits this situation well, since both the /proc and page fault cases do not actually need to change the memory map. With the change applied, the system can do more things simultaneously. Even on uniprocessor systems, things will work better, since work need not wait for the resolution of a page fault, which can involve disk activity.

It's also a relatively fundamental and scary change for a stable kernel release. Even Linus, while accepting the change, is a little nervous about it:

I'm applying this to my tree - I'm not exactly comfortable with this during the 2.4.x timeframe, but at the same time I'm even less comfortable with the current alternative, which is to make the regular semaphores fairer (we tried it once, and the implementation had problems, I'm not going to try that again during 2.4.x).

The patch also, as of 2.4.3pre5, "has only been tested on i386 without PAE, and is known to break other architectures." There have been some good reports, though, on the performance effects of this patch. But it may mean that the real 2.4.3 will not be out for a while yet, since Linus will want to give it some time to stabilize and prove that everything works.

Global kernel analysis. Dawson Engler, at Stanford, has put together an extension to the gcc compiler which allows it to perform detailed, global analysis of a body of code and point out a number of possible bugs. Over the last week, he and his students have been posting the results of this work. They have found some impressive things, including:

  • Places where pointers are interpreted as user-space addresses (i.e. they are passed to a function like copy_to_user), but where the same pointer is also dereferenced directly (nine cases). Kernel code running in process context can generally get away with that sort of reference, but it's risky for a few reasons. The user-space address may not be valid (or the page could have been swapped out since the kernel last checked), and there are security implications as well.

  • Large variables on the kernel stack (22 cases, plus a few more when devfs is used). The kernel stack is limited in size, and putting large variables there risks overflowing the allocated space.

  • Various locking bugs (16 cases). These include paths that could take out a lock and forget to unlock it, and potential misuse of the processor state flags.

  • Places where kernel memory is used after it has been freed 14 cases.

  • Inconsistent treatment of interrupts (28 cases). Code that sometimes runs with interrupts enabled and other times not is likely to be buggy; functions which sometimes forget to reenable interrupts certainly are.

  • Places where a pointer returned by a function that can fail is not checked (120 cases).

  • Calls to functions that can block while interrupts are disabled or spinlocks are held (163 cases). Kernel code, of course, should not block in either case, or serious performance problems (or deadlocks) can result.

The response from the kernel hackers has been quite positive, for one simple reason: quite a few new bugs have been found. Many of the things being tested for are the sort of subtle bug that can be very easy to create and hard to track down.

The tool that is doing this work is called "MC" ("meta-level compilation"); it was created by a team headed by Mr. Engler and sponsored by DARPA grant MDA904-98-C-A933. MC defines an extension language for gcc called "metal," which can be used to program specific checks to be applied to the code. Here, for example, is a piece of code which looks for errors in enabling and disabling interrupts:

{ #include "linux-includes.h" }
sm check_interrupts {
  // Variables used in patterns
  decl { unsigned } flags;

  // Patterns to specify enable/disable functions
  pat enable = { sti(); }
             | { restore_flags(flags); };
  pat disable = { cli(); };

  // States
  // The first state is the initial state
  is_enabled: disable ==> is_disabled
     | enable ==> { err("double enable"); };
  is_disabled: enable ==> is_enabled
     | disable ==> { err("double disable"); }
     // Special pattern that matches when the SM
     // hits the end of any path in this state
     | $end_of_path$ ==> { err("exiting w/intr disabled!"); };
}

Those who are interested in MC should check out Mr. Engler's paper "Checking system rules using system-specific, programmer-written compiler extensions," which is available on the net in PostScript format. The code fragment above was taken from that paper. Please don't bug Mr. Engler about obtaining the code, however; the system is still under development and has not yet been generally released. In time, however, it should become part of the standard kernel hacker's toolkit.

JFFS2 released. The folks at Red Hat have announced the release of the JFFS2 filesystem. It's a complete reimplementation of Axis Communications' Journaling Flash Filesystem, with a number of improvements. It's available via CVS, and only works with the 2.4 kernel. An iPAQ kernel with JFFS2 built in is available as well.

Help out the kernel manual pages. Andries Brouwer has released man-pages-1.35. In the announcement, he notes:

David Mosberger expressed his worry that especially man page Section 2 is out-dated and x86 specific, with no indication that other architectures even exist. No doubt he is right.

So the request has gone out: please point out the man pages that are wrong, and, if possible, supply fixes while you're at it. This is a good way for people to help out without having to actually hack on the kernel code.

FSM's kernel patch. Kernel patches do not normally come with press releases, or, at least, they didn't. This week, FSMLabs (the RTLinux company) announced that it had released a memory management patch. It seems that a memory management change in 2.4 creates some difficulties for RTLinux, so they went and developed a fix. And announced it to the world.

The patch itself is quite small, especially considering that the one real chunk of code there is lifted the MIPS version of <asm/pgalloc.h>. It adds a couple of big kernel lock invocations, and a function which propagates page directory changes across processes and CPUs. That's evidently enough to restore low latency on a reliable basis for real-time tasks.

Other patches and updates released this week include:

Section Editor: Jonathan Corbet


March 22, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds