[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is still 2.4.6. The latest 2.4.7 prepatch from Linus is 2.4.7pre6, which contains another set of fixes and updates.

Alan Cox's 2.4.6ac2 was released on July 7; it contains, as usual, a rather longer list of fixes. Andrea Arcangeli also has a prepatch (2.4.7pre5aa1) out with a set of fixes, mostly to the core kernel code.

On the 2.2 front, the current prepatch is 2.2.20pre7, released on July 4.

Building the initial root filesystem into the kernel image. One often-heard theme in the ACPI discussion (covered last week) was that it would be nice to move much of the ACPI setup into user space. That way, perhaps, it would not be necessary to bloat the kernel memory footprint with a few hundred kilobytes of ACPI code. But, with current kernels, doing boot-time stuff in user space involves using the "initrd" (initial RAMdisk) functionality, which not everybody likes. Even Linus dislikes it.

But there are advantages to having an initial root filesystem handy; it's the clunkiness of the initrd interface that people object to. So, Linus has another idea: why not just append a root filesystem, in tar format, to the kernel executable image? That way, it can be set up in an entirely automatic way, and everything the kernel needs will be right at hand. Linus likes this idea enough that he would likely make it a mandatory part of the boot process.

Once you have the initial root as part of the kernel image, you can move a lot of stuff over. For example, the whole process of finding the real, permanent root, and finding and running the init process could live there. That would remove a bunch of code from the kernel itself, and make it far easier to customize for specific situations. It would no longer be necessary to have a DHCP implementation in the kernel for diskless systems. And one could even put the kernel configuration file there, satisfying a perennial request.

Given that, with a proper implementation, most users would not even have to know that this "piggyback" filesystem is present, its implementation in the 2.5 series seems likely.

How to do 64-bit PCI DMA? In past weeks we have looked at efforts to make it possible to perform DMA I/O operations from anywhere in the first 4GB of memory on the system. That would be a significant improvement over the current situation, but it still leaves out an important case. Large server systems, anymore, can contain well over 4GB of memory, and there do exist PCI cards which can perform DMA with 64-bit physical addresses. For such systems, wouldn't it be nice to take advantage of the 64-bit mode and eliminate the hassles of memory zones and bounce buffers entirely?

The folks working on the IA-64 port decided this would be a good idea. Accordingly, they turned dma_addr_t (an internal "cookie" type used by the DMA support routines) into a 64-bit quantity, and changed the semantics of pci_set_dma_mask() to allow drivers to specify that their hardware can do 64-bit DMA. This interface works for the immediate needs the IA-64 porters had, but David Miller, who "owns" the PCI DMA interface, has made it clear that he opposes moving it to the other architectures. Instead, he wants to see a more comprehensive 64-bit DMA interface designed and implemented in the 2.5 development series. (Those interested in the current interface, incidentally, can see the excellent DMA-mapping.txt file found in the kernel source documentation directory).

Some people are unhappy with that position; after all, anything deferred to 2.5 might not see a stable release for another two years. But David's objections make some sense, and they give an interesting view into the issues you have to take into account when designing this sort of interface. The discussion may look like a complaint session, but it is really the initial design work for a high-performance DMA interface.

Some of the issues with the simple extension used by IA-64 are:

  • There is little desire to expand dma_addr_t to 64 bits when the vast majority of its users will never perform 64-bit DMA. An extra 32 bits of temporary space may seem small compared to the cost of performing an I/O operation, but every bit counts. So a more likely solution is a new type (dma64_addr_t, perhaps) and a separate interface to go with it.

  • On some systems and peripherals, 64-bit DMA is significantly slower than the standard, single-cycle 32-bit version. On such systems, 32-bit DMA may be preferable even if it involves things like bounce buffers in the CPU.

  • Reasonable hardware (quite a bit of hardware isn't) includes an I/O memory management unit (IOMMU) which provides a type of virtual memory for peripherals. The IOMMU can cause all operations to occur within the 32-bit range. It also has the nice feature of making scattered pages look physically contiguous. On such systems, you normally do not want to bother with 64-bit operations...

  • ...except in cases where you will be performing very large transfers. In the worst case, huge operations can take up most or all of the IOMMU mapping registers, choking I/O in the rest of the system. Devices with this sort of I/O pattern are better off using 64-bit I/O even if it is slower. A 64-bit DMA interface must allow a driver to make this sort of decision.

  • The IA-64 scheme will not work well on 32-bit systems (which can still have 64-bit physical addresses) because it relies on the existence of kernel virtual addresses for the DMA buffers. 32-bit systems with large amounts of memory do not have kernel-space mappings for much of that memory. A truly portable interface must use struct page pointers rather than virtual addresses.

Chances are good that some sort of 64-bit DMA API which addresses the above issues will find its way into 2.5. Thereafter, it may even be backported to 2.4, at which point it will be widely available.

Other patches and updates released this week include:

  • Marcelo Tosatti has a patch which provides improved virtual memory statistics from the kernel.

  • Eric Raymond has posted a State of CML2 message on where the new configuration system is. ("The dungeon walls in CML2 adventure now occasionally feature entertaining grafitti. Spot all the in-jokes and collect a valuable no-prize.")

  • Alexander Viro is looking for testers for his patch to the Minix filesystem that moves directories into the page cache.

  • Andrew Morton has a new ext3 patch for 2.4 kernels. This patch is not just a port, though; he has included a number of fixes and has also reworked things to minimize the number of changes required to the core kernel.

  • Davide Libenzi has posted a new /dev/poll implementation which, he claims, provides the most efficient notification interface for busy network servers.

  • The latest security module patch was posted on July 6.

  • Harald Welte will be giving presentations on netfilter at several upcoming Linux events.

  • Greg Kroah-Hartman has posted an updated Compaq/Intel PCI hotplug driver.

  • devfs v182 was released by Richard Gooch.

Section Editor: Jonathan Corbet


July 12, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds