[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is still 2.4.5. Linus is back from his trip to Japan, and has released the first 2.4.6 prepatch. It contains the usual scattering of fixes, including some aimed at the ongoing virtual memory problems with the 2.4 kernel series.

The prepatch also contains one problem that can cause problems with unresolved symbols in some modular kernels. Ingo Molnar produced a simple fix which gets around the problem; after several iterations he also released a much more involved fix dealing with a number of other difficulties introduced in 2.4.6pre1.

Alan Cox, meanwhile, is up to 2.4.5ac9. Along with the usual fixes he has included a new driver for the Sony Vaio I/O controller, the new improved Configure.help file (see below), and a number of fixes for problems found by the Stanford checker.

Another approach to bounce buffers. The discussion last week on virtual memory and bounce buffers passed over one interesting approach to fixing the problem. We'll try to make it up this week, but doing so requires a little bit of background in how Linux memory management works. The following discussion is somewhat specific to the x86 architecture, but the concepts carry over to any 32-bit system.

On a processor with 32-bit addresses, a total of 4GB of memory may be addressed. Linux systems have traditionally not been able to handle that much memory, however, due to the way memory is laid out. For some time, the virtual address space has been broken up as shown in this diagram:

[Virtual memory layout]

(Please excuse your editor's crude use of the "dia" tool...).

Thus, any individual user-space process may have up to 3GB of address space, with the uppermost 1GB being reserved for the kernel. 2.2 kernels always laid out memory in this way, and 2.4 still does by default. Before 2.2, the kernel mapped the entire range of physical memory into its portion of the address space, since that mapping provided easy, direct access to all of the memory on the system. It made life easy for kernel hackers, but it also limited the total amount of memory on the system to the amount that could be mapped in the kernel segment - 1GB, with subtractions for things like the PCI I/O memory space. That is why 2.2 kernels could only make use of about 960MB of memory.

The 2.4 release lifted that restriction by enabling the kernel to work with memory that is not directly mapped. The result was (1) the ability to handle up to 64GB of memory on x86 systems, and (2) the creation of a new class of memory, "high memory," which is a little trickier to work with. So physical memory is now divided into three zones, as shown by another ugly diagram:

[Physical memory layout]

The "DMA" zone is memory which is addressable by old ISA peripherals that can only do 24-bit DMA; "normal" is memory above 16M which is directly mapped into the kernel, and "high memory" is memory which is not directly mapped. On systems with tremendous amounts of memory, most of that memory is "high memory."

Now, finally, we can get to the bounce buffer problem. With current 2.4 kernels, any memory which is in the DMA or normal zones may be used in DMA operations with reasonable devices on reasonable buses. When I/O must be performed to or from high memory, however, a bounce buffer is allocated in one of the lower zones. The data is copied through the bounce buffer in its travels between the device and its high memory home. On I/O bound systems with a lot of high memory, bounce buffers can create a lot of pressure in the normal and DMA zones, leading to memory shortage problems. All that copying isn't entirely desirable either.

Jens Axboe looked at this problem and made an observation that, in retrospect, should have been fairly obvious. PCI devices can (usually) address 32 bits (4GB) of memory. When the kernel uses a bounce buffer for high memory below 4GB, it is really wasting time and memory. The kernel may not be able to address that memory directly, but the peripheral can. So why not just do the DMA operation directly and skip the bounce buffer?

So Jens announced a patch which does exactly that - at least, for block devices. (He neglected the little detail of where to find the patch; he filled that in a little later). This patch adds a fourth memory zone, called "DMA32," that sits between the top of the normal zone and the 4GB barrier. Whenever block I/O is being performed on memory in the DMA32 zone, it is done directly without the use of a bounce buffer. Bounce buffers are still required above 4GB; it's a rare peripheral that can reach memory that high. But, even in that case, the bounce buffer can live in the DMA32 zone.

The benefits of this patch are clear. Given that, in all likelihood, most systems with high memory have no more than 4GB, bounce buffers can be eliminated entirely in many cases. And for the rest, the available memory for the allocation of these buffers has increased. The patch was not included in 2.4.6pre1, but chances are good that a version of it will appear in a future release.

About that swapping problem. Problems with the use of swap space in 2.4.x were also mentioned last week. The amount of complaining has gone up recently, as more people try out the 2.4.5 kernel, which appears to be worse.

The response from the kernel hackers so far has been "make sure your swap area is at least twice as large as the amount of RAM in the system." That allows the kernel, essentially, to waste half of the swap space as a copy of what is currently in RAM, and actually swap to the other half. That technique helps, but a number of people are, not surprisingly, unimpressed with that requirement. 2.2 systems seemed to work better, after all. In fact, 2.2 had the same problem with swapping, but the more aggressive approach to caching in 2.4 has made the problem bite a lot more people.

Help is on the way, however. Marcelo Tosatti has posted a patch which cleans the junk out of swap space. Some testers have reported that it improves things for them. There is currently some debate, however, as to whether the locking used by the patch is safe. So it's probably not for everybody, yet. A different swap patch was posted by Mike Galbraith; it is new as of this writing and has not seen much testing yet. With luck, however, some variant of one of these patches will make it into a 2.4 kernel soon.

How should the kernel handle temperatures? David Welton pointed out that parts of the kernel that handle temperatures (generally watchdog drivers) are not consistent - some code uses Fahrenheit, and other parts use Celsius. He proposed a global configuration option to decide what should be used kernel-wide.

The response that came back will be familiar to linux-kernel watchers; the kernel should use one standard temperature format, and user-space tools can convert to other standards if necessary. Fahrenheit has very few defenders for that standard, not surprisingly. But the proponents of Celsius look like they will lose as well. If one is going to use standard units, one should do it right and use kelvins. That way nobody is happy.

Then again, one reader proposed that BogoDegrees be used instead...

Configure.help is complete. Eric Raymond has announced that, after great effort, the kernel Configure.help file now contains help entries for every one of the 2699 known configuration symbols.

Of course, Eric knows how ephemeral such a victory can be. So he is also proposing a policy that no patches will be accepted unless they contain help entries for any new configuration symbols they introduce.

Other patches and updates released this week include:

Section Editor: Jonathan Corbet


June 7, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds