[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 On the Desktop
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is 2.4.4, which was released on April 28. This release contains, of course, the zero-copy networking code, and a number of other enhancements and bug fixes.

It also evidently contains some new bugs - the complaint level for 2.4.4 appears to be higher than with some of the other 2.4.x releases. The "run children first" change to the fork() system call (discussed in the April 19 LWN Kernel Page) seems to have caused quite a few problems, and it has already been reverted in Linus's 2.4.5pre1 prepatch. A number of other problems have been reported as well; people without a burning need to upgrade to 2.4.4 might just want to wait for 2.4.5.

As noted, Linus has since released 2.4.5pre1 with some additional fixes. Alan Cox, meanwhile, is at 2.4.4ac3 with a rather longer set of fixes.

Trashing your filesystem with dump. It has been known for a very long time that using dump to back up live filesystems can result in corrupt backups. It turns out that, with Linux kernels through 2.4.4, dumping a live filesystem has the potential to corrupt the filesystem in place, even if the dump process has no write access.

Alexander Viro reported the bug which makes this possible. It can happen only on SMP systems, and is not easy to trigger, but it is there. Essentially, if the filesystem allocates a new metadata block for the filesystem, and a separate process reads that block at the wrong time, the wrong data will be written back to disk. The fix is relatively straightforward, and has already been incorporated into 2.4.5pre1.

Linus pointed out an interesting little fact as part of this discussion: dump will not work correctly on 2.4-based systems in any case. The filesystem keeps quite a bit of useful information in the page cache - and will do so even more in the future. dump, however, works with the raw device, which deals with the buffer cache instead. The two are not always synchronized, and it is possible that dump will end up reading the wrong data. In case that's not clear enough:

So anybody who depends on "dump" getting backups right is already playing russian rulette with their backups. It's not at all guaranteed to get the right results - you may end up having stale data in the buffer cache that ends up being "backed up".

For now, there is really no easy way to fix dump for 2.4. If you're using it, this might be a good time to go looking for a different tool.

A 2.4 swap bug - maybe. A discussion of Linux swapping behavior turned to an interesting aspect of how the system handles swapping. Swap space, of course, is used to hold copies of pages which have been moved out of memory. It turns out that when a page is restored to main memory from swap, its slot in the swap file is not released. Thus, in some situations, Linux can "run out" of swap space even though much of that swap space is taken up by data that is not currently swapped out. According to Alan Cox, this behavior is forcing some large systems to remain with the 2.2 kernel.

At first blush, the proper course of action seems simple: when a page is swapped back into memory, its swap slot should be freed. As is often the case, though, life is not that simple. Some of the twists that come up here (as pointed out by Stephen Tweedie) include:

  • The system tries to group memory areas together in the swap file. Freeing swap slots individually would destroy that grouping, thus fragmenting the swap area. That, in turn, can lead to slower swapping performance.

  • Suppose you swap a page in, then, due to memory pressure, have to swap it back out again. If the page has not been modified, the copy on disk is still valid, and the page can be freed immediately. If, instead, the slot has been freed, the page must be written again.

  • The Linux virtual memory system does not make it easy to find all of the page table entries that are pointing to a particular page. When a particular process swaps in a page, its page table will be updated accordingly. But if other processes have page tables pointing to the swapped page, they will continue to point to the disk copy. Until all of those references are changed, the disk copy can not go away.

The proper solution, thus, would appear to be to retain the copy in the swap cache for as long as there is no real virtual memory pressure. Once things get tight, it's time to start throwing things away. In some cases, though, (such as the one where the swap copy of a page is valid), it may be better to toss out the memory copy of the page.

Moral: virtual memory is never simple.

SGI releases XFS 1.0. SGI has announced the release of XFS 1.0. The 2.4 kernel now has another journaling filesystem in a stable release state; XFS also offers a number of features for users with intense I/O bandwidth requirements. It claims to work with NFS, and comes with an installer for Red Hat Linux 7.1 systems.

Perhaps not wanting to be left out entirely, IBM has released JFS beta 3 release 0.3.0.

ECN enabled on kernel.org. The kernel.org FTP server has enabled ECN (the Explicit Congestion Notification protocol). If you find you're now having a hard time downloading that new kernel, there's a chance you're behind a broken firewall which doesn't handle ECN properly. See Jeff Garzik's ECN page for help if you find yourself in that situation.

Other patches and updates released this week include:

  • A group of students at Northern Michigan University has announced a set of benchmarks that were run on kernels from 2.0.1 through 2.4.0. They give a view as to how performance in a number of areas has changed over time.

  • A new Linux security module patch has been released by Greg Kroah-Hartman.

  • A new FreeS/WAN KLIPS2 design, meant to work well with netfilter, has been announced by Richard Guy Briggs. He's looking for feedback. Those who are not easily offended might also enjoy the Linux FreeS/WAN poster on his site.

  • Keith Owens has released a few new versions of the kdb kernel debugger which work with recent kernel releases.

  • Jeff Mahoney announced a large patch to ReiserFS which makes it work on big-endian systems.

  • Daniel Phillips has posted a patch to make his directory indexes work in the page cache. As with many of his patches, this one includes a lengthy discussion of what changes have been made and why; it makes for interesting reading on how the VFS works. Daniel subsequently released a pair of new patches, one of which works with Alexander Viro's "directories in the page cache" patch.

  • Jeff Dike has released a new version of user-mode Linux which works with 2.4.4 and contains a number of fixes.

  • Eric Raymond's CML2 patch is up to CML2 1.3.3.

  • Andreas Gruenbacher released version 0.7.11 of the access control list patch, quickly followed by version 0.7.12.

  • Matthew Wilcox has posted a description of what he thinks should be done with file locking in the 2.5 development series.

Section Editor: Jonathan Corbet


May 3, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds