Sections:
Main page
Linux in the news
Security
Kernel
Distributions
Development
Commerce
Announcements
Back page
All in one big page

Kernel development

The current development kernel release is 2.3.17. Once again, this is a very large patch. There are a lot of driver changes, and one can see the first results of Alan Cox's attempts to clean up the SCSI code and make it somewhat more readable. Note that this isn't the long-awaited rewrite of the SCSI layer; it's just some cleaning up so that he can get on with trying to find some other problems...

The current stable kernel release is still 2.2.12. Some problems are still being reported with this kernel - in particular, there still appears to be a memory leak problem that tends to turn up on systems running inn. As of this writing, the developers are still trying to chase that one down.

Trouble with 2.2 - sort of. In the August 19 issue of LWN we reported that RAID 0.90 was being folded into the 2.2.12 kernel. A week later we had to update that report and note that the RAID patches had been pulled back out. Why? Too many people objected to such a large change - which requires new user tools - going into a stable kernel minor release.

Now it appears that the NFS server patches will suffer a similar fate. These patches, developed by H.J. Lu and others, are absolutely necessary for sites doing serious NFS service with Linux systems. Heterogeneous environments, in particular, frequently turn up problems with the stock 2.2 NFS server. The patches add no new functionality, they just make the server actually work. But they require recent versions of the user space tools.

Serious users of both RAID and NFS have been applying these patches by hand for as long as they have been using the 2.2 kernel. A number of distributions also ship versions of the kernel with the patches applied. The natives on linux-kernel are starting to get restless. These patches are considered necessary by many just to get a working system. Why do they not find their way into the mainstream kernel?

There seem to be a few problems here:

The 2.2 kernel shipped too early. Given just how long we all waited for that kernel, it may seem a bit strange to say that it should have taken longer. But the point is that 2.2 shipped with certain capabilities - such as NFS service - in an essentially nonworking state.
It did not help that NFS was without a maintainer for much of the 2.1 series.

Most importantly, at this point:

The Linux user community is increasingly intolerant of disruptive changes in the middle of a stable kernel series. Minor updates to a stable kernel are supposed to just work. Given the environment out there, where an occasional kernel security problem is bound to turn up and require fixing, this point of view can not be ignored. People who are using Linux systems to get real work done do not want to have to go through a disruptive upgrade just to close a hole.

It thus seems that stable kernels, increasingly, will have to remain truly stable. Even important changes get blocked out at minor release time.

So how does the kernel make progress in this environment? The recipe would seem to be more frequent major releases, each of which contains a rather smaller set of changes. If stable kernels are truly stable at (or shortly after) their release, and a new release is not more than two years away, people can calmly wait for larger changes to be integrated.

The 2.3 feature freeze, first promised for almost a month ago, still has not been announced. If a 2.4 release - which can contain working RAID and NFS implementations - is to happen before the end of the year, this freeze needs to happen soon. If it's not already too late.

Big memory and Raw I/O. LWN first reported on the "big memory patch," which allows Intel-based Linux systems to address up to 4GB of memory, back in the August 19 issue. This week Siemens and SuSE, the sponsors of that development, issued a press release announcing the patch and pointing out that it got included into 2.3.15.

There is a remaining loose end or two, however, with the big memory patch. In particular, it breaks Stephen Tweedie's raw I/O patch, which was also recently added to the development series. The raw I/O patch allows data to be transferred directly between user-space buffers and a device. There is an obvious performance gain in some situations, since a copy through the kernel's buffer cache can be avoided.

Just as important, however, is simply avoiding the cache altogether. Caching some kinds of data is wasteful, since there will not be another need for it. Rather than improving performance, caching of such transient data has only the effect of forcing out everything else, leading to a sluggish system. Anybody who has had to wait for the window system to respond after a large program build or file copy has seen this mechanism in action. Caching can also be a problem when disks are shared between more than one system.

Why is there trouble with raw I/O in particular? It seems that quite a few devices out there are unable to address high memory - memory above 2GB. Attempts to tell such devices to move data to or from high memory can result in total failure at best, and a corrupted system is a distinct possibility. The kernel is careful to keep its own buffers in lower memory so that this sort of problem does not arise. But raw I/O uses user-space buffers, which can end up anywhere. For this reason, the big memory patch currently disallows any sort of raw I/O to high memory.

The solution in this case appears to be "bounce buffers." A bounce buffer is a kernel-space buffer which lives in low memory. When I/O is requested to a high memory page, and the device can not handle it, an intermediate copy is made via the bounce buffer. This technique defeats the "zero copy" aspect of raw I/O, but preserves the other advantages. It can also be implemented so that bounce buffers are only used when they are truly needed. A proper implementation with bounce buffers should not only solve the raw I/O problem, but it should also allow the page cache to exist in high memory.

Finally, when the day arrives that more than 4GB of memory can be supported, bounce buffers will become even more necessary. A lot of PCI devices out there do not handle 64-bit addressing and will need help at that point, even if they currently work with high memory. (Thanks to Stephen Tweedie, whose linux-kernel messages were ruthlessly plundered for this article).

A few other patches and updates released this week:

Mikael Pettersson released version 0.6 of his performance monitoring counters patch; this version includes support for the AMD Athlon processor.
Devfs v120 was released by Richard Gooch.
David Skingsley released a Kallisto GPS card driver.

Section Editor: Jon Corbet

September 9, 1999

For other kernel news, see:
Kernelnotes
Kernel traffic
Kernel Newsflash

Next: Distributions