[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is still 2.4.9. The latest prepatch from Linus is 2.4.10-pre4, which was released on September 3; it contains the usual array of fixes and updates. Also included is a new set of functions for access to the PCI configuration space; how this access is done has changed somewhat, but the API visible to drivers and such remains the same. A large PowerPC update is also part of this patch.

Linus has kept a relatively low profile on linux-kernel since this patch came out.

Alan Cox's latest is 2.4.9-ac9. It contains a merge of 2.4.10pre4 and many more changes, including a set of knobs for virtual memory tuning, a new MODULE_LICENSE tag (see below), a big PowerPC-64 merge, and more.

Andrea Arcangeli has released 2.4.10pre4aa1, which contains some direct and raw I/O fixups and User-mode Linux.

License tagging in modules is now a part of the "ac" kernel series. A new macro has been added, and all loadable modules should specify their licensing with a line like:

    MODULE_LICENSE("GPL");
The next version of the modutils package (and the insmod command in particular) will complain when presented with modules that lack the license metadata. People who maintain modules will probably want to add these tags soon.

Some people have, reasonably, asked what the purpose of this information is. The answer is that there's a few things one could do with licensing information; for example, one can imagine a tool that verifies that a particular system is running only free code. The Lineo GPL Compliance Toolset could make use of this information.

The real purpose, however, is that Alan Cox is tired of receiving bug reports from people who are running proprietary modules in their systems, and wants an easy way to throw them out.

Unfortunately I get so many bug reports caused by the nvidia modules and people lying when asked if they have them loaded that some kind of action has to occur, otherwise I'm going to have to stop reading bug reports from anyone I don't know personally.

In other words, the loading of a proprietary module will "taint" a running kernel, and greatly reduce the user's chance of getting help from the core kernel hackers. This has always been the case; the only change is that it has, evidently, become necessary for the kernel to track its own taintedness.

This tracking will happen via a sysctl flag like /proc/sys/kernel/tainted; the loading of a non-GPL module (or one lacking license information) will cause that flag to be set. Once set, the tainted flag can not be reset without rebooting. The tainted flag will be printed whenever the system panics, and post-mortem tools (i.e. ksymoops) will recover it as well. So anybody trying to track down a kernel problem will be able to see quickly if proprietary modules have ever been loaded.

Of course, if users lie about which modules they load, they could conceivably mess with the tainted setting. But people aren't too worried about that happening; most users who would be able to do that are probably not the type who actually would. And, besides, as Alan points out, in the U.S. such an act could be seen as defeating a digital rights management scheme, and subject the guilty party to a five-year prison sentence, plus extra for conspiracy...

The case of the conflicting block ioctls. How do you access the last sector on a odd-sized disk? The Linux kernel (normally) likes to deal with a 1K block size, which (normally) gets mapped into two contiguous, 512-byte sectors on a disk drive. But, if the drive contains an odd number of sectors, this scheme leaves the last sector unreachable. That is not normally considered to be a big problem; one missing sector does not make a very large dent in the capacity of a modern disk drive.

It turns out, however, that the IA-64 architecture has defined a new partitioning scheme which stores a copy of the partition table in the last sector on the disk. With this scheme, it matters if that sector is not reachable - there is no way for an administrator to change the partition table when running under Linux. This kind of limitation can lead administrators to do irrational things, like install Windows. Clearly a fix was required.

So, back in February, Michael Brown created a new ioctl call specifically to provide access to the last sector on a disk; that call is now part of the IA-64 port. It is not, however, to be found in the mainstream kernel at this time, which is part of the problem.

Ben LaHaise, meanwhile, needed an ioctl call that would retrieve the size of a device as a 64-bit quantity - disks are getting big, after all. So he put together a patch with the new ioctl call. Part of his patch was to the ext2 utility programs; that patch was accepted and distributed as part of the e2fsprogs distribution a little while back.

The problem: both new ioctls needed a new ioctl number. The block I/O ioctl numbers are defined in linux/fs.h, and it is a natural thing to do to pick the next one in series. There is no central registry for these ioctl numbers other than the source itself; if you have not put in a patch reserving a given ioctl number, it's not really yours. Unfortunately, Michael Brown did not put in any such patch. Ben LaHaise also failed to do so before (accidentally) getting the ioctl number included in the e2fsprogs distribution. Of course, both chose the same number.

This week, Ben put in a patch to reserve the number for his ioctl. His reasoning: renumbering the IA-64 ioctl will be less disruptive than changing e2fsprogs. He also believes that the ioctl is the wrong solution to the problem; it should have been fixed for all systems in the general block code, rather than being an IA-64-specific ioctl.

Michael has also sent in a patch trying to reserve the same ioctl number. Just asking for a number is not enough, though, as can be seen from Alan's reaction to Michael's patch:

Rejected. I still think this is an ugly evil hack and want no part in it

Ben, meanwhile, gave up on the old ioctl number and put in a new patch using a higher number. That one, too, turned out to be problematic, causing BLKGETSIZE64 to move up one more time...

A new 64-bit PCI interface has been posted by David Miller. This iteration is different from previous versions in that it looks a lot more like the standard, 32-bit interface. All of the pci64_ calls have gone away, and the dma_addr_t type can be used in all drivers again. There is a new set of pci_dac_ functions for drivers needing (and able to support) a 64-bit DMA space.

It has been pointed out that the PCI interface still lacks one important capability - peer-to-peer DMA transfers. There are situations where it would be helpful to move data directly between two PCI devices; for example, moving an image from a video capture device directly to video memory. There is some interest in supporting this sort of operation; an API will likely be developed in the near future.

Page aging is broken? Much work is going into the improvement of the virtual memory system in 2.4 - one of the biggest remaining problems. It would be hard to summarize everything here, but one development stands out: Jan Harkes has discovered that the page aging algorithm in the kernel does not work at all.

Page aging is the process of tracking the usage of pages in memory in the hopes of identifying those which have not been used in the longest time. The "oldest" pages are the first candidates to throw out when memory is tight. The 2.4 kernel, however, is aging pages so aggressively that almost all pages on the system look ancient. So a significant part of the VM system is essentially inactive, and nobody noticed until now.

Alan Cox responded with a claim that the "ac" series has better VM performance due to a more disciplined approach to VM patches. Jan Harkes pointed out that the "ac" series has serious page aging problems as well. "I guess it is just more carefully papering over the existing problems."

The solution, according to Rik van Riel, is to be found in the "reverse mapping" patch that he is currently working on. The current page aging scheme looks at virtual memory, via process page tables. It would be far more efficient to look at physical memory, since that is, in the end, the resource that is being managed. But it is currently difficult to find the page tables that reference a given physical page. Once reverse mapping is in place, a lot of page aging (and VM in general) problems should become easier to manage. Of course, reverse mapping looks like a fairly serious patch to be considering for the 2.4 stable series... (Those interested in trying out the reverse mapping patch should look at this posting for the latest version and a changelog).

Other patches and updates released this week include:

  • The min/max discussion continues; Peter Breuer has submitted a version of the macros which addresses the worst of the type issues that Linus was trying to solve with the three-argument version. Linus has indicated that he likes this solution. The min/max macros may not have yet reached their final form. Of course, it has also been shown that things can be taken too far...

  • Yves Rougy has announced yet another set of filesystem benchmarks.

  • Joe Thornber is working on a new LVM implementation; there is a test version available, and he is looking for comments from interested parties.

  • Release 1.2 of the 2.5 kernel build system is available from Keith Owens.

  • Release 1.0.4 of IBM's journaling filesystem is available.

  • Peter Braam has released version 1.0.5.1 of the InterMezzo filesystem.

  • Version 1.1.2 of the Rule Set Based Access Control patch has been released by Amon Ott.

  • Jari Ruusu has announced version v1.4d of the loop-AES encrypted filesystem.

  • Greg Kroah-Hartman has released a new version of the Compaq Hotplug PCI driver.

  • Greg has also posted a new security module patch.

  • The Stanford Checker has found a new set of potential security problems in the kernel.

  • Andreas Gruenbacher has posted a new access control list patch.

  • Version 0.8.5 of the PCTEL "linmodem" driver has been announced by Jan Stifter.

  • Harald Welte has released iptables-1.2.3.

Section Editor: Jonathan Corbet


September 6, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds