[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 Development
 Commerce
 Linux in the news
 Announcements
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current development kernel release is 2.5.7, which was released on March 18. This will be the last such release for a while, since Linus has headed off for a two-week vacation. This release contains some fairly big patches, including:
  • The current ACPI patch from Andrew Grover and company. ACPI did not work particularly well in the prepatches, but a number of the problems have been dealt with for the 2.5.7 final release.

  • The NAPI work by Jamal Hadi Salim, Robert Olsson, and Alexey Kuznetsov. NAPI changes the way the kernel handles network traffic, with the intent of greatly improving performance on high-performance systems. It was discussed on this page back in October. (See also the NAPI HOWTO document that was merged into the kernel source).
Also included is a bunch of USB work, an ALSA update, more reworking of the IDE code, some NFS work (including Alexander Viro's new "nfsd" filesystem), the fast user-space mutex ("futexes") patch, a VLAN code thrashup, and no end of other fixes. It's a big patch.

The latest from Dave Jones is 2.5.6-dj2, which adds a number of fixes and updates to the 2.5.7-pre2 kernel.

Guillaume Boissiere updated his 2.5 status summary on March 20.

The current stable kernel release is 2.4.18. The current 2.4.19 prepatch is 2.4.19-pre4; it includes a massive m68k update, the new video device code, and a great many other fixes.

Alan Cox's latest 2.4.19 patch is 2.4.19-pre3-ac4.

For those of you who aren't into all that bleeding-edge 2.4 stuff, David Weinehall has released 2.0.40-rc4 which should, with luck, turn into a real 2.0.40 soon.

Note that other kernel tree announcements now appear with the rest of the patches at the bottom of the page.

Kernel compilation benchmark update. When we last checked in with the fast-kernel-compile benchmark crowd (in last week's LWN Kernel Page) they had managed to get a kernel compilation down to just over 10 seconds. The record has fallen, however: Anton Blanchard has announced that he was able, through use of a 32-way PowerPC64 system, to build the benchmark kernel in 7.52 seconds. "...not a bad result for something running under a hypervisor."

Watch for sub-second kernel compilations, coming soon to a million-dollar machine near you...

The obligatory BitKeeper update. Marcelo Tosatti has announced that he is now using BitKeeper to manage the 2.4 code. See this note for information on how to access his tree.

"Discussions" of BitKeeper's licensing continue, not helped by the discovery of a temporary file race vulnerability in the BitKeeper installer. Readers of this page are more than familiar with the licensing arguments, though; we'll not repeat them this time.

Reworking the 2.4 VM patches. The word on the net for some time has been that the 2.4.x virtual memory subsystem almost works as it should; all that remains is to incorporate the last set of patches from Andrea Arcangeli. 2.4 maintainer Marcelo Tosatti has not yet integrated those patches, however; he has wanted to see them split up and documented so that he actually understands what he is putting in. This seems like a not unreasonable approach for a stable kernel maintainer to take. Thus far, however, Andrea has not found the time to rework his patches as requested, so they remain unapplied.

Andrew Morton has decided to try to break this logjam by reworking the patch and splitting it up into a form suitable for submission to Marcelo. Andrew has, in consultation with Linus, annotated the patches and made his own changes (including leaving a few patches out entirely). The result is an interesting view into what still needs to be fixed with the 2.4 virtual memory implementation; it's worth a detailed look.

Andrea's 10_vm-32 patch was split into 24 individual pieces. Andrew has dropped eight of those, leaving 16 patches for consideration:

  • aa-020-sync_buffers changes the way throttling of memory allocators is done. Throttling is done to slow down tasks to the point where the disk can keep up with their memory activity; this patch can cause memory allocators to wait until disk I/O initiated elsewhere in the system (as well as I/O they initiate themselves) has completed.

  • aa-030-writeout_scheduling improves how the "bdflush" kernel thread flushes dirty buffers to disk. Rather than try to write out every dirty buffer in the system in a single run, bdflush now stops partway through. The VM is also made less likely to block writing processes while their dirty buffers are flushed to disk; more of that work is now done asynchronously in bdflush. "This code works well. Fixes the problem where copying a large file between two disks only exercises one disk at a time."

  • aa-093-vm_tunables add some knobs for run-time tweaking of VM performance. They consist of a set of ratios controlling just how much scanning will be done at any given time, and how much memory should be put to different uses.

  • aa-096-swap_out is, according to Andrew, "probably the most important patch." It includes much more aggressive shrinking of kernel caches when memory is tight, and a tweak which keeps the system from repeatedly trying to swap things out when it isn't working. This patch also gets rid of the "out-of-memory killer," taking the approach of simply failing memory allocations instead. The "init" task gets some special protection; its memory allocations will succeed, by spinning and waiting for memory if need be.

  • aa-100-local_pages deals with an interesting fairness issue: a process may go off freeing memory to satisfy an allocation, only to find that, when it's done, other processes have stolen all the pages it freed. Andrea's code kept a whole list of freed pages that the freeing process could use first; Andrew has simplified it to only set aside a single page.

  • aa-110-zone_accounting increases the resolution of the system's memory accounting, and changes the locking rules as well.

  • Small tweaks. A number of the patches perform relatively simple housekeeping or other such tasks; these include aa-010-show_stack, aa-040-touch_buffer, aa-120-try_to_free_pages_nozone, aa-140-misc_junk, aa-150-read_write_tweaks, aa-160-lru_release_check, aa-170-drain_cpu_caches, aa-180-activate_page_cleanup, aa-190-block_flushpage_check, and aa-200-active_page_swapout.

Together, these patches represent a great deal of work by both Andrea and Andrew. With luck, they'll find their way into a better VM in the near future.

Exit sections and monolithic kernels. The kernel has had, for some time, the ability to mark functions and data with an "exit" flag. The traditional use for this marker is to flag functions which are used at module unload time. Modules need cleanup functions so that they can be gracefully removed from the kernel. When those modules are linked statically into the kernel, however, they will never be removed. In this case, functions and data marked with the "exit" flag are simply discarded, making the kernel image smaller.

It's a worthwhile optimization. Anybody who has tried building a kernel with a modern binutils distribution, however, will have experienced the annoying, useless "undefined reference to `local symbols in discarded section .text.exit'" message that accompanies a failed link. The problem is simple: the kernel has numerous pointers to exit functions and data. Usually a human can determine that, in cases where the exit section has been discarded, those pointers will never be used; they are thus harmless. The linker doesn't see things that way, though, and newer versions refuse to complete the link when dangling exit pointers exist.

The workaround has been to define a devexit_p macro which causes exit pointers to disappear in non-modular code. It's a bit of a hack, but it gets the job done. The devexit_p calls have been slowly working their way into the kernel code.

But now Linus has come up with a different approach. Rather than discard all that exit code, why not keep it in the kernel and use it to gracefully shut down the hardware at system shutdown time? The code is there, one might as well make use of it, even if the kernel gets a bit bigger. devexit_p's days in the kernel may be numbered.

Other patches and updates released this week include:

Alternate kernel trees:

  • Andrea Arcangeli's 2.4.19-pre3-aa2 includes the VM-32 patch and a number of performance-oriented fixes.

  • Jörg Prante has released 2.4.19-pre3-jp8, which adds a large set of patches to 2.4.19-pre3.

Core kernel code:

  • Hubertus Franke has posted a well documented patch which speeds up assignment of new process IDs.

  • Neil Schemenauer has released capwrap, a kernel module allowing an administrator to run executables with specific (restricted) capabilities.

  • Version 12h of Rik van Riel's reverse mapping VM code is available.

Development tools:

  • ksymoops 2.4.5 was released by Keith Owens.

  • Karim Yaghmour has released version 0.9.5pre6 of the Linux Trace Toolkit.

Device drivers

  • David Miller has released the eighth beta of the new Tigon3 driver.

  • Jörg Prante has posted instructions for backporting the 2.5 ALSA code to 2.4 kernels.

Kernel building:

  • Roman Zippel has announced a new kernel configuration mechanism. It is designed to be simpler and faster than CML2. See the followup posting for the latest version. "So far I hadn't very much feedback. What's up? Is everyone suddenly completely happy with cml2? Now is your chance to evaluate the alternatives or does this require too much work before you can start flaming?"

Miscellaneous:

Networking:

  • Jean Tourrilhes has released a version of the new wireless driver API for the 2.4 kernel series.

  • This week's Affix BlueTooth stack release from Dmitry Kasatkin is version 0_96.

  • The Netfilter team has released iptables 1.2.6.

Section Editor: Jonathan Corbet


March 21, 2002

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2002 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds