[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 Development
 Commerce
 Linux in the news
 Announcements
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current development kernel is 2.5.17, which was announced on May 20. This release includes the new quota code (see below), some VFS changes, and quite a few other improvements and fixes.

Linus released 2.5.16 on May 18; it contained a bunch of low-level x86 paging changes, the usual IDE patches, the 64-bit jiffies patch (no more uptime wraparounds), a bunch of USB updates, an IrDA update, and various other fixes.

Linus has been posting changelogs in a new, shorter format; for those who prefer the details, here are the long-format logs for 2.5.16 and 2.5.17.

The latest prepatch from Dave Jones is 2.5.15-dj2.

The current 2.5 status summary from Guillaume Boissiere is dated May 22.

The current stable kernel release is 2.4.18; Marcelo has not released any 2.4.19 prepatches since May 2.

Alan Cox released 2.4.19-pre8-ac5 on May 20; it contains the latest reverse mapping VM, various copy_to/from_user cleanups (see below), a bunch of aacraid and I2O changes, and many other fixes.

Alan has also released 2.2.21, which contains only one small fix added after the last release candidate.

The new disk quota code went into 2.5.17. The reimplemented quota system is the work of Jan Kara, who has posted a brief summary of the changes.

With the new quota implementation there is, of course, a new quota file format. It brings a number of advantages, including 32-bit user IDs and accounting for files sizes in bytes rather than blocks. Filesystems like ReiserFS, which take pains to store very small files efficiently, will benefit from the more accurate quota accounting. The old quota format is still supported, however, along with any other format that people might wish to implement: quota formats may now be implemented by separate modules and plugged in as needed.

The filesystem interface to quotas has changed, of course. Filesystems now have much more flexibility to override, modify, and extend quota operations. Thus, for example, journaling filesystems can journal quota operations as well.

The old quota tools can be supported through a compatibility interface, but really taking advantage of the new code will require new tools. Those can be found on the Linux quota SourceForge page.

Software suspend goes in. Pavel Machek posted a new version of the software suspend code (written originally by Gabor Kuti) asking "What can I do to make this applied?" The answer, according to Linus, is nothing - he has accepted it for 2.5.18.

The swsusp patch provides a laptop-style suspend capability to any machine, whether the underlying BIOS power management code supports it or not. When you tell your system to suspend (via a "magic sysrq" key, a user-space tool, or /proc/acpi/sleep), it starts by flushing everything to disk that it can. Files are synced to disk, processes are swapped out, and in-kernel data structures are reduced to a minimal state. This step is required to save important data, of course, but it also has the effect of freeing up a great deal of memory which will, then, not need to be saved separately.

The suspend code then sets up a new set of page tables for all remaining memory which must be saved; the swap code, at that point, can be used to save the rest to disk. Once that is done, the system can be halted. Restoring the system is done by booting with the "resume=" option; it pulls in all of the saved memory and generally reverses the steps taken above.

Suspending a running system in this way is a task with many potential pitfalls, and, no doubt, one or two of them remain in the code. It is marked "experimental" for a reason. Nonetheless, this patch has been circulating for a long time, and has been tested by quite a few people. It was time for it to go into the mainline kernel.

Still waiting for kbuild. Keith Owens has sent out his 'third and final attempt' to get a response from Linus on when and how the new kernel build patch might get merged. Linus still appears to not have answered Keith directly, but he did let this slip in a thread on a completely different subject:

I'm hoping we can get there in small steps, rather than a big traumatic merge. I'd love to just try to merge it piecemeal.

This suggests that somebody needs to split apart the kbuild patch into a number of small, incremental steps. Of course, this patch is not the easiest to split in that manner...

/dev/port goes out. Martin Dalecki, seemingly, has not been flamed enough despite all of his IDE work. So he set out to remove the /dev/port device. /dev/port is a pseudo device which makes it easy for suitably privileged application programs to access (x86) I/O ports via read and write calls. Martin cites a number of problems with the code, including the fact that nobody is using it.

Interestingly, Martin didn't get his desired flames, despite a separate attempt to stir them up. Linus agrees that it should probably go; about the only dissent came from Alan Cox, who claims to have seen it used, especially in scripting languages. Linus has not issued a final decree, but it looks like /dev/port is no more.

copy_*_user and errors. The kernel, of course, runs in its own memory space that is distinct from the address space given to each user process. So some care must be taken when moving data between the two; it's not just a matter of following a pointer. The kernel provides a whole set of functions that copy data between kernel and user space; the two most general are called copy_to_user and copy_from_user.

A common convention for utility routines within the kernel is to return zero on success, and an error code (suitable for passing back to user space) on failure. But the copy functions are different: they return the number of bytes that were not actually copied. For most operations, that value will be zero - everything is copied as requested. When something goes wrong, however, the return value tells just how far into the operation the error happened.

Rusty Russell sees a problem with this interface: kernel programmers get confused and expect that the copy functions follow the same conventions as most other kernel utilities. That leads to code like the following (taken from the Intermezzo filesystem):

        error = copy_from_user(&hdr, buf, sizeof(hdr));
        if ( error )
                return error;
The problem, of course, is that the "error" returned to the user does not look like an error code. Thus problems are not caught and bugs result. That's when the programmer is happy that liability laws have not caught up to software yet.

Rusty states that, of the 5500 copy calls in the kernel, 415 are incorrect, despite an audit done one year ago. He would like to change the copy functions to return an error code like most other utilities, or to send a segmentation fault signal and return nothing at all. Either solution would eliminate what he sees as a trap which trips up many or most kernel programmers sooner or later. (Of course, being Rusty, he expressed it in a rather more colorful manner).

Making internal kernel interfaces safer to use seems like a good cause, but Rusty seems to be mostly alone on this one. The main counterpoint is that the "partial success" return value can be useful in some situations: restarting system calls after signals or simply reporting a partial result back to user space. There are, however, very few places in the code where that information is actually used.

On the other hand, it has been pointed out that a partial success value "n" need not indicate that the first n bytes were copied. Trying to speed things up with fancy MMX instructions could cause things to be copied in strange orders. Andrew Morton has also pointed out a bug in the copy code that can corrupt data (though it's not something that comes up in normal use). That bug could be fixed by copying from the far end of the array first in some situations. The point of all this is that a partial success might not tell you which bytes were actually copied.

That notwithstanding, it looks like very little will actually change - Linus has spoken:

The current interface is quite well-defined, and has good semantics. Every single argument against it has been totally bogus, with no redeeming values.

One can not accuse Linus of not being clear on what he thinks.

So the one remaining approach, it seems, is to simply go through and fix all of the broken copy calls on a regular basis. Arnaldo Carvalho de Melo has already jumped into that task, posting fixes for intermezzo, OSS, ISDN, block drivers, and USB. But chances are more mistakes will creep in with future patches.

Other patches and updates released this week include:

Kernel trees:

2.4 backports

Core kernel code:

Device drivers

Filesystems:

Kernel building:

Miscellaneous:

Section Editor: Jonathan Corbet


May 23, 2002

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2002 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds