[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is still 2.4.2. The 2.4.3 prepatch is up to 2.4.3pre8; some of the issues with the memory management changes are still being worked out, so expect another prepatch or two before the real 2.4.3 release comes out. Alan Cox, meanwhile, is up to 2.4.2ac27.

The 2.2.19 kernel has been released, finally. No release notes are available yet, but the final product will look very much like 2.2.19pre18.

A couple of 2.4.2 problems have been biting people. While the 2.4 kernel is highly stable for most people, there are a couple of problems that have been creating difficulties for some users. Here's what you should be watching out for if you're running 2.4.2:

  • Loopback mounts do not work. This problem is not new - it has been known since the 2.4.0-test days. Nonetheless, one could say that the existence of the problem has not been highly publicised. Loopback mounts allow the mounting of a filesystem contained within a regular file on another filesystem. A common use for loopback mounts is to mount ISO (CDROM) images that are sitting in a large disk file. If you attempt that under 2.4.2, the mount process will go into an uninterruptible wait, thus becoming an unkillable process.

    A patch, written by Jens Axboe, has been in circulation for some time; it is already incorporated into the "ac" series and into the 2.4.3 prepatch. Some vendors shipping 2.4 kernels have integrated the patch into their systems as well. This particular problem will be history soon.

  • The "out of memory" (OOM) killer is being invoked too soon. The OOM killer is supposed to run when the system needs memory and is absolutely unable to find any. Its job is to start killing processes to free up some memory, while doing its best to not kill anything important. The approach used by the OOM killer was discussed in some detail in October 12, 2000 LWN kernel page.

    The OOM killer has gotten some bad press this week from people who think it chooses badly, or that it should not exist at all. Doing without an OOM killer entirely would be hard; even if the kernel is patched so that it does not overcommit memory, situations can arise when memory is simply not to be found. The alternative to killing a process in that situation is, generally, to allow the system to lock up.

    Most users, however, should never have the opportunity to see the OOM killer in operation. It takes a severely stressed system to run that short of memory. Or that is the idea, anyway. It would appear that the system is calling the OOM killer when there is plenty of other memory that the system should be able to free without killing processes. Nobody has yet announced that they have found or fixed the problem, however.

Regression testing for the Linux kernel? Problems like the loopback lockup described above lead some to wonder if maybe the kernel needs a formalized regression testing system. Given the complexity of the system, weird bugs are going to be a frequent consequence of code changes. Many software development projects employ regression testing in order to trap as many of those problems as possible before they bite somebody. But the Linux kernel has never had a serious regression testing program.

Some aspects of the kernel are rather resistant to formal regression testing. In particular, it would be difficult indeed to formally test all of the possible hardware combinations out there. For this sort of testing, the kernel probably already has the ideal setup: thousands of brave souls who routinely download and run development kernels. These testers can check things out on their hardware, but they are not the same as a formal testing program that is designed to cover as much of the code as possible.

There are a couple of testing efforts out there now. The most prominent one, perhaps, is the Linux Test Project which is run by SGI. It currently includes about 100 tests, most of which check the performance of various system calls (though there is also one that tests f00f bug handling as well). The PowerPC architecture also has a limited set of regression tests to be sure that its kernels can build and boot.

These are both good efforts, but they are a far cry from a comprehensive testing program. A complete job will be a tremendous amount of work, and it remains to be seen if anybody can find enough motivation to do the whole job.

How big should dev_t be? Linux, like its Unix ancestors since the beginning, identifies devices with a sixteen-bit number, known by its C type dev_t. Of those sixteen bits, eight are the major number (essentially, the index of the driver which handles the device), and eight are the minor number (usually interpreted by the driver as a unit number). Thus, a total of 256 major and 256 minor numbers are available (well...OK...actually double that, since the number spaces for block and char devices are independent).

That is not a whole lot of device numbers. Some types of devices have needed more that 256 minor numbers for some time; SCSI disks and pseudo terminals are a couple of obvious examples. To make up for the lack of minor numbers, these devices have multiple major numbers assigned. But major numbers, too, are in short supply: a look at the current device number allocation document shows that only numbers 226-239 are unassigned.

So it has been accepted wisdom that dev_t needs to grow for a while. It is generally expected that the 2.5 development series will create a larger dev_t, and deal with the various user space compatibility issues that this change will cause. In fact, most of those issues will be relatively minor; glibc has been using a much larger dev_t for some time already. Thus, most applications should not notice the change. There are some exceptions, though: tar files, for example, have 8-bit major and minor numbers built into them.

While there is agreement on the need to grow dev_t, it has become clear that there is little consensus on how big the type should become. Andries Brouwer started a little storm with a posting stating that a 64-bit value should be used. 64 bits is what glibc uses, and it would be large enough to not run out anytime soon, even if "sparse" allocation schemes are used.

Linus, however, replied by saying, flat out, that a 64-bit dev_t would not be accepted. His proposal is to go to a 32-bit value, with twelve bits for major numbers and twenty for minor numbers. His reasoning, essentially, is:

  • Major numbers do not need much expansion; we have not, yet, even managed to exhaust eight bits. Since major numbers tend to be used in table lookups (to find the driver when a device is opened, for example), the major number space should not be so large that the lookup table takes too much memory.

  • There is a need for more minor numbers, especially for things like pseudo terminals on large, multiuser systems. But twenty bits should be more than enough even for that use.
Linus sees 64-bit device numbers as being wasteful kernel bloat which encourages bad habits, and, perhaps most importantly, is contrary to the direction that he wants to go. His plan appears to be to try to get away from static major numbers for most devices. Rather than having a dedicated major number, a device driver should allocate one dynamically when it initializes and export it to user space via /proc. Either that, or it should just use devfs, which simply takes device numbers out of the picture for the most part.

That last idea is likely to draw some complaints. The inclusion of devfs in the kernel shut down most of the flame wars, but a lot of people still do not like it and do not configure it into their systems. If devfs becomes a required component in the future, expect some disgruntlement in the ranks.

The 2.5 development kickoff kernel hackers summit is happening in San Jose on March 30 and 31; it's sponsored by VA Linux Systems. This invitation-only event will host most of the planet's active Linux kernel hackers, and should lay much of the groundwork for the upcoming 2.5 development series. A preliminary agenda has been posted, showing some of the topics up for discussion.

Your humble kernel page editor managed to wrangle an invitation based on his device driver book work, and hopes to be able to do some interesting reporting from the summit - to the extent that can be done without hindering the free and open nature of the discussion.

Other patches and updates released this week include:

  • Justin T. Gibbs has posted version 6.1.8 of the aic7xxx SCSI driver. Among other things, the latest version fixes a build-time quirk that required those building the 2.4.3 prepatch kernels to have Berkeley DB1 installed on their systems.

  • Keith Owens has released modutils 2.4.5 and kdb v1.18.

  • Keith has also posted a proposal for a new kernel interface which would provide efficient access to the performance monitoring registers on large, multiprocessor systems.

  • Eric Raymond posted several updates to the CML2 configuration system, culminating in CML2 0.9.7. Along the way, there was a strong debate on the renaming of a number of configuration symbols and a push by Eric to get CML2 incorporated into the 2.4 kernel before the 2.5 development series starts. Alan Cox, however, has indicated that he is not willing to do that.

  • Jonathan Morton has released a patch which enables the kernel to run in a mode where it does not overcommit memory, making it much harder to find oneself in an "out of memory" situation. The patch also makes some tweaks to the OOM killer.

  • Richard Gooch has announced devfs-v99.20 - a backport of the device filesystem to the 2.2.19 kernel.

Section Editor: Jonathan Corbet


March 29, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds