On the Desktop
Linux in the news
All in one big page
See also: last week's Kernel page.
The current kernel release is still 2.4.3. Linus's 2.4.4 prepatch has reached 2.4.4pre4; it includes much more stuff from Alan Cox's "ac" series, a number of fixes, and, interestingly, the zero-copy networking patch (see below). Alan Cox's series (currently at 2.4.3ac9) is getting smaller as the patches get into the mainstream kernel, but there's still quite a bit of stuff there. Some of what's there, including the user-mode Linux patch, will evidently not go to Linus at all, at least for now.
Zero-copy networking will be in 2.4.4. This patch, by David Miller, Alexey Kuznetsov, and others, has been in development and testing for some time, and was incorporated into the "ac" kernel series back in 2.4.2ac4. In a way, it is a surprising change to see in a stable kernel series, since it makes fundamental changes deep in the networking code. From all reports, however, it is solid, and, in certain situations, it should produce significant performance benefits.
Zero-copy networking speeds things up by avoiding, whenever possible, copies of the data to be transferred. In an optimal case, a buffer full of data sent over the network by an application (an FTP server, say) will go directly to the network interface from the application's memory. Without zero-copy networking, however, that's not how things are done - at a minimum, the data is copied into kernel space and assembled into one or more packets before going to the wire. All that copying can slow things down and fill up the cache; it's not surprising that people want to eliminate it.
Making zero-copy work is not straightforward, and the patch is large. Various issues have to be dealt with, including:
To handle all of this stuff, the zero-copy networking patch makes some fundamental changes to the networking core code. Traditionally, packets are passed around via a struct sk_buff structure, usually referred to as an "skb." The skb contains the entire packet, headers and all. With zero-copy, an skb can now be "paged," or "nonlinear," meaning that it consists of several pieces which are not contiguous in memory. Much of the code which handles skb structures must be changed to take this new structure into account.
The driver interface has also seen changes. There is a new "features" variable in the netdevice structure which is used to mark some of the capabilities of the device (and its driver); these include the ability to perform checksums, deal with high memory, and do scatter/gather I/O. This variable was actually added in 2.4.0-test12, just before the official 2.4.0 release, but it's only with the zero-copy patch that it is seeing some real use.
The change in the driver interface means that zero-copy I/O is only possible if the relevant network driver has been updated to support it. So far, only the AceNIC and Sun HME drivers have been fully converted. The work required appears not to be large, assuming that the hardware is reasonable, so more drivers will likely be updated in the future.
Zero-copy networking is not a win for everybody; it really only makes sense on high-end hardware and very fast networks. In that situation, though, it should be a real performance win; expect more amazing web server benchmark results in the near future.
Children first. Adam Richter posted a patch which makes a subtle change in the way the fork() system call works. It is interesting to look at as an example of how little tactical changes can affect operating system performance.
On Unix-like systems, the child of a process that forks gets a copy of the parent process's entire address space (normally). Actually copying everything, of course, would be most inefficient. Read-only memory (such as program code) can be simply shared, but writable memory requires a bit more cleverness. The technique used is to share the data space, but to mark it "copy on write" (or "COW"). Both processes see the same COW pages, until one of them tries to make a change. At that point, the kernel makes a copy of the relevant page, making it private to the process, which is unaware that anything has happened.
The 2.4.3 kernel, on a fork(), puts the child process into the run queue and resumes executing in the parent. The child will run sometime later as part of the normal timesharing of the processor. It turns out that this is not the best way of doing things from a performance point of view, though.
The parent process will likely go on modifying its private data, causing the system to make copies of the various COW pages shared with the child process. But the child, in most cases, is unlikely to ever look at those pages; instead, it will probably perform a few operations, then go and exec() some other program, which breaks its attachment to the shared pages. If the child were to run first, the parent would probably not need to copy all those pages, and performance would be improved.
And, in fact, according to Linus, the performance difference is visible. As a result, this patch went into 2.4.4pre4 (though it does not show up in the changelog).
Other patches and updates released this week include:
Section Editor: Jonathan Corbet
April 19, 2001