[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current stable kernel release is still 2.4.2. Linus has issued no 2.4.3 prepatches as yet. Alan Cox has not slowed down, however; his prepatch series is up to 2.4.2ac6. As usual, it contains a great many fixes, including another important ReiserFS "zero byte" fix.

A question went out on the differences between Linus's releases and the "ac" patches. There is no definitive list of patches that are unique to one or the other (Alan has no time to maintain one). The "ac" series does tend to pick up everything that goes into the official Linus release, but the reverse is certainly not true.

Linus characterized the difference between the two releases thusly:

The two series are fairly disparate, as they have different intentions. Alan accepts some stuff that I would be nervous about, and sometimes I say "to hell with it, we need to fix this" and make Alan nervous.

Alan, instead, described it this way:

I think the key word is actually probably 'predictability'. The Linus tree is conservative. (IMHO too conservative and probably in his not conservative enough 8))

It looks like we'll have two stable development series for a while.

Meanwhile, the 2.2.19 prepatch is up to 2.2.19pre16. In a separate posting, Alan stated that the real 2.2.19 release is about one week away.

A patch to make NFS work well with ReiserFS was posted by Neil Brown. As was discussed in last week's kernel page, the changes involved are significant. So, as Neil states:

Alan Cox has suggested that these changes may not be appropriate for 2.4, so we might have to wait for 2.5 to see them on kernel.org, but we don't have to wait till then to find the bugs.

That announcement brought out a (predictable, perhaps) set of complaints about yet another stable kernel series with NFS problems. With 2.2, much of the trouble only really got cleared up with 2.2.18, released late last year. And there are still some interoperability problems that will only be fixed when 2.2.19 comes out.

On the 2.4 front, some patience will be required. The Powers That Be may well eventually relent and include Neil's patch if the need appears to be strong enough. But it certainly will not happen until the 2.4 series appears to be rock solid, and experience says that could take a little while yet.

Per-process namespaces are now available for Linux, thanks to a patch posted by Alexander Viro ("He's back. And this time he's got a chainsaw."). The idea is based on the Plan9 concept by the same name. Essentially, every process in the system gets its own view of the filesystem. Filesystems can be mounted for one process while being entirely invisible to others. Namespaces can be thought of as a much more flexible form of the chroot() system call.

Alexander has also posted a tiny program which starts a shell running in its own namespace, which is useful for testing out the idea. And, of course, he is looking for testers who can find the problems with the patch. Those waiting for a stable version will do so for a while - this patch is intended for the 2.5 series, once it gets started.

Directory indexes for ext2 are another topic that was discussed last week in this space. The discussion continued, but branched off into a couple of interesting areas.

One is in the area of hashing functions. The directory index function depends heavily on a good hashing function to spread the entries evenly across the index. So several candidates have been evaluated by running them in a usermode Linux kernel; the results have been summarized by Daniel Phillips.

The executive summary is that Daniel's own hash function won. In the process, it handily beat the dentry hash function, used since the 2.1 days in the dentry cache. Linus was not entirely surprised by this result:

It looks like the hash function was done rather early on in the dcache lifetime (one of the first things), back when nobody cared about whether it was really good or not because there were many much more complicated questions like "how the h*ll will this all ever work" ;)

So, as a side result, expect to see some work done on the dentry hash function in the near future.

Even more soundly beaten was the "R5" hash used in ReiserFS. In this case, the problem is not that R5 is a poor hash function; it was, instead, written to satisfy a different set of objectives. R5 will put similar filenames next to each other, which makes the ReiserFS lookup algorithm faster. For the ext2 directory index, however, it is more important to spread things out evenly, so a different function is called for.

The "hash wars" are not done yet; though. Expect some new contenders to show up before too long.

Meanwhile, people started talking about backward compatibility. Ted Ts'o pointed out that, with a very small change to the way the index is stored on disk, full compatibility can be maintained with older ext2 implementations. The cost, in the form of lost space in the directory index, is quite small - less than 1%. Daniel Phillips has not adopted the compatible mode completely, however - he plans to support it as an option in the code so that people can choose the implementation they like better.

When the discussion moved on to tail-block fragmentation, however, Linus felt the need to jump in and argue against backward compatibility. Tail-block recursion is the process of splitting up blocks in the filesystem to allow them to hold the last parts of multiple files. Imagine you have an ext2 filesystem with a 4096-byte block size, and a 5000-byte file to store there. That file will occupy two blocks, with only 904 bytes being stored in the second. Thus, almost half of the space used is wasted. In filesystems that store a lot of small files (netnews partitions being the classic example), large amounts of space can be lost. ReiserFS will store small files efficiently, but ext2 has never had that capability.

When Mr. Phillips mentioned plans to provide tail-block fragmentation for ext2, Linus jumped in and asked that it not be done. He has no objection to the technique, it's just that he thinks a whole new filesystem should be created. Rather than just graft on tail-block fragmentation, a complete rethink should be done to create a better, extent-based filesystem with a vary large block size. And it should not be called "ext2."

In another posting he explained his reasoning in more detail; it is an interesting look at his philosophy for the evolution of the Linux code. Essentially, creating a new code base makes it easier to eventually get rid of the old one, leading to better long-term maintainability. A transition to a completely new filesystem can be done on the user's own time, and can happen relatively smoothly.

In comparison, if you have "new features in X, which also handles the old cases of X" situation, you not only bind yourself to backwards compatibility, but you also cause yourself to be unable to ever phase out the old code. Which means that eventually the whole system is a piece of crap, full of old garbage that nobody needs to use, but that is part of the new stuff that everybody _does_ use.

This is why, for example, Stephen Tweedie's journaling filesystem is called "ext3."

Will Mosix go into the kernel? Mosix is a fancy clustering system which implements a lot of nice features, such as process migration. Many folks would like to see Mosix, or other clustering implementations, go into the standard kernel sometime in the 2.5 development series. There is, of course, no way to know if that will happen at this point. However, Rik van Riel has created a mailing list where representatives of the various clustering projects can discuss the idea together.

Other patches and updates released this week include:

Section Editor: Jonathan Corbet


March 1, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds