[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is 2.4.4. There have been no kernel releases (not even prepatches) from Linus since 2.4.5pre1came out on May 2.

Alan Cox remains busy; his latest is 2.4.4ac6, which contains another long list of fixes but nothing radical.

To top it off, Alan has also started the 2.2.20 prepatch series with 2.2.20pre1. At this point, only serious fixes are going in at this point: "Expect me to be very picky on changes to the core code now."

Moving block devices to the page cache. In last week's kernel page we looked at a subtle metadata corruption bug brought about by the fact that I/O to block devices uses the buffer cache, while the filesystem code uses the page cache. Conversation on this topic has continued in this (otherwise slow) week, so it's worth another look. Some background first...

Linux systems use two distinct caches to improve performance. Both are used to keep copies of disk-resident data in main memory, and thus to avoid excessive disk I/O operations. These caches are:

  • The buffer cache holds individual disk blocks; entries in the cache are indexed by the device and block numbers. Unix-like systems have had a buffer cache for a very long time, and the block I/O system is built around the "buffer head" structure used to implement the buffer cache.

  • The page cache, instead, holds full pages. The pages come from files in the file system, and, in fact, page cache entries are indexed (more or less) by the file's inode number and the offset within the file. A page is almost invariably larger than a single disk block, and the blocks that make up a single page cache entry may not be contiguous on the disk.
The page cache tends to be easier to deal with, since it more directly represents the concepts used in higher levels of the kernel code. Thus, over time, parts of the kernel have shifted over from using the buffer cache to using the page cache.

The individual blocks of a page cache entry, of course, are still managed through the buffer cache. But, as we saw last week, accessing the buffer cache directly can create confusion between the two levels of caching.

Reading and writing a block device directly, as is done by utilities like dump and fsck, works only with the buffer cache. It turns out that Linus wants to change this behavior, even though he is not tremendously concerned about the corruption problem discussed last week. Having block devices use the page cache will clean up a lot of design issues, improve performance, and gets away from the idea of using the buffer cache as a cache. The buffer cache, for Linus, really should just be a low-level block I/O mechanism that leaves the actual caching tasks to higher levels.

Not much time passed before Andrea Arcangeli released a patch moving block I/O into the page cache. Essentially, he has eliminated the special-purpose block_read and block_write functions, and made a block device look like a large file. So now the general-purpose file I/O functions may be used instead.

As an added bonus, Andrea has obsoleted the raw I/O interface, implementing instead an O_DIRECT flag which may be used to perform I/O directly between the device and user space. This change makes raw I/O a much more straightforward affair, since it's no longer necessary to set up and bind the separate /dev/raw devices.

A change of this magnitude, of course, would not normally be expected to go into the 2.4 kernel - though some other surprising things have made it in. Expect to see something like Andrea's patch be incorporated early in the 2.5 cycle, however.

ReiserFS - ready for prime time. Hans Reiser has posted a note saying, essentially, that all of the real bugs in the ReiserFS filesystem have been fixed as of 2.4.4. Since the filesystem was included in 2.4.1, its user base has grown greatly and that has, not surprisingly, led to an increase in bug reports. The ReiserFS hackers have been tracking down these problems quickly, and many fixes have come out. As a result, the "beta period" appears to have come to a close.

There are a few outstanding issues, though. ReiserFS still only works on small-endian machines, for example (a patch exists which fixes this problem, but it hasn't seen wide testing yet). You still need to apply an additional patch to use ReiserFS and the NFS server together. And the filesystem checker tool still needs some work. But the biggest problems appear to have been overcome; the "experimental" label may be removed from ReiserFS in a kernel release soon.

The problem of broken configurations in CML2. Now that a lot of the CML2 issues have been resolved, people are starting to think more about how they will actually use the new kernel configuration system. And a bit of a problem has come up.

Anybody who builds a lot of kernels becomes quickly enamored of the "make oldconfig" operation, which makes a configuration from an old kernel work with a new one. It will stop and ask about any new configuration options, and it makes some attempts to resolve things when an old configuration violates the rules in the new kernel.

Some hackers noticed that CML2 did not handle things well when a new kernel adds rules that make an old configuration invalid. Eric Raymond's initial response was to say that recovering from broken configurations was too hard. He had the numbers to back the point up:

But wait! There's more! If some of the variables participate in multiple constraints, the numbers get *really* large. Worst-case you wind up having to filter 3^1976 or

61886985104344314262549831301497223184442226760005632366142367454062\
53798069007245829607511803014461980205195265648765807533359692422405\
26663343478651948197640717559171334587246360190820597462466618699616\
83769466038480440588536443139761873343981834731232898868121056624288\
25175698197266097855144317654507849536499564272166336474891989097438\
35187399533347347604275259693285565328638904436467418552386274533685\
91327533953419273284845915678229675363862482902467758788105098892672\
89040426968478652648633090613090819909922898996729964073665423236084\
87819939319685920863027286269975666073166040062426792612975756185462\
81534154977458915332736966975415596732075433912438120798023875787687\
12139869442963906795755406077094024235937984546041146032870399467676\
50750114775766120549985366981610796100249952621482595580440335923663\
89536648507944663518188694691546583650254496327051865064380044199561\
11898186436375597975714968012719658007155903874756222061921

distinct configurations. The heat-death of the Universe happens while you're still crunching.

People might have been more impressed with this display of mathematical analysis skills if it weren't for the fact that make oldconfig works with the old configuration system. The problem, perhaps, is that the technique used (configure out anything that breaks the rules in the new kernel) lacks the sort of elegance that Eric would like to see in his code:

I guess you didn't know that I trained as a mathematical logician. On the one hand, that predisposes me to try to find "elegant" solutions where you might regard brutality and heuristics as more appropriate.

Elegance appears to have lost, though - witness the announcement of CML2 1.4.0, the "brutality and heuristics" release...

Other patches and updates released this week include:

Section Editor: Jonathan Corbet


May 10, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds