[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel version is 2.4.13, which was released on October 24. Linus surprised some people by including another set of VM tweaks in the final release (i.e. without testing in a prepatch), but those tweaks had already seen some use in Andrea Arcangeli's releases. Says Linus: "See if you can break it."

Alan Cox's current patch is 2.4.12-ac5. It contains a bunch of ARM updates, the latest VM tweaks from Rik van Riel, and a number of other fixes.

On the 2.2 front, Alan has released 2.2.20-pre11, with a small set of updates and some unspecified security fixes (see this week's front page) If all goes will, this version will become the official 2.2.20 release, so interested parties are encouraged to try it out.

Toward a new way at looking at devices. Interestingly, Linux kernels through 2.4.x have no unified way of keeping track of devices. There are registries which hold lists of drivers, and various other bits and pieces, including device arrays in the drivers themselves. But if you were to ask the kernel to tell you about every device plugged into the system, it would not be able to answer. Even if one of those devices were a speech synthesizer.

Getting a better handle on devices was one of the topics discussed at the Kernel Summit last March. Now Patrick Mochel has taken things forward with a proposal for a new "driver model" in the 2.5 kernel. A number of things would change under the new scheme:

  • All buses and devices will be treated as being hot-pluggable. Devices present at boot will be treated as if they had just been plugged in.

  • A new struct device structure will be created for each physical device and bus on the system. These structures will be organized into a tree which reflects the actual configuration of the hardware. A PCI bus device, thus, becomes the parent node for all devices plugged into that bus. (Way back when, struct device was used for network devices, but the 2.3.14 kernel release changed that).

  • A new virtual device driver filesystem (ddfs) type will be created. Each device in the system will export a ddfs entry, which can be used to query and change the state of the device. For example, a ddfs entry will tell whether a given device has been suspended or not.

  • Each device will have a struct device_driver which contains a small set of global operations. One of them, probe, checks for the existence of a specific device and sets it to a known state. The remove operation disconnects the driver from the device. There are also suspend and resume operations for power management functions.

  • The iobus structure will be used to track buses on the system. There will also be a struct iobus_driver containing another set of operations, mostly having to do with bus scanning and dealing with plugging and removal events.

Much of the motivation behind all this work is to do power management right. Power management is increasingly part of every computer component made, and people, rightly, want to be able to take advantage of the power management features. But doing things like suspending part or all of a system requires a detailed knowledge of that system's hardware structure. Thus this new model.

So it is not all that surprising that power management has been the topic of most of the discussion on this proposal. The initial plan called for a two-step suspend procedure: one to save device state, and one to shut the device down. It was pointed out that saving device state can involve actions like allocating memory, which can require the cooperation of other devices. So the plan now calls for a three-step suspend routine:

  1. SUSPEND_NOTIFY tells each device that a suspend is coming. No state need be saved at this point, and the device could be asked to perform further operations after this call. The driver must, however, allocate any memory it will need to save the state later on.

  2. SUSPEND_SAVE_STATE causes the driver to actually save the state of the device. It should also stop handling I/O requests at this point.

  3. SUSPEND_POWER_DOWN is the final stage, which causes the device to be physically powered down.

When the system resumes, a two-step process is followed: one to reset the devices to a known state, and one to resume the pre-suspend state and resume operation.

There was a developing conversation on higher-level response to suspend events: things like trying to save dirty buffers to disk, synchronize RAID arrays, and so on. Trying to make all that work right was beginning to look like a pretty thorny problem, until Linus stepped on the discussion by pointing out that a suspend operation need not do all that.

If somebody removes a disk or equivalent while we're suspended, that's _his_ problem, and is exactly the same as removing a disk while the disk is running. Either the subsystem (like USB) already handles it, or it doesn't. Suspend is _not_ an excuse to do anything that isn't done at run-time.

So suspend is _not_ supposed to be equivalent of a full clean shutdown with just users not seeing it. That's way too expensive to be practical. Remember: the main point of suspend is to have a laptop go to sleep, and come back up on the order of a few _seconds_.

Nobody appears to have disagreed with this position; it was one of those "Linus moments" where he points out the important thing people have been overlooking.

The new driver model is still evolving; the latest version can be found here.

On MODULE_LICENSE and EXPORT_SYMBOL_GPL. In the hopes of clearing up some confusion, Keith Owens has posted a description of the MODULE_LICENSE and EXPORT_SYMBOL_GPL macros, and exactly what the two are intended to achieve. Recommended reading.

In search of faster pipes. Hubertus Franke and his colleagues at IBM decided to look into ways of making Linux pipes perform better. To that end, they decided to tweak two factors:

  • The size of the kernel buffer used to hold pipe data. It is normally one page (usually 4K); they experimented with buffers up to eight pages long.

  • Early awakening of readers. Normally, readers of a pipe are awakened only when a write operation completes. By waking them up after only part of the data to be written has been copied into the pipe buffer, the group hoped to improve concurrency.

The results reported are interesting: neither change improved performance on uniprocessor systems - indeed, performance often dropped. On SMP systems, instead, increasing the pipe buffer size can speed things up. The early awakening helped slightly in some cases and hurt in others; it doesn't appear to be worth the effort most of the time.

The question was raised: why not try with the single-copy pipe implementation by Manfred Spraul? The IBM crew went for it, and came up with a new set of results. Single-copy pipes are not necessarily the big win that people might expect. The single-copy patch got better lmbench results in some situations, but lagged behind the IBM patches in most tests. In fact, it lagged behind even the standard Linux pipe implementation in many cases.

The final conclusion might be that increasing the buffer size may help pipe performance in some high-end, SMP situations. Other than that, the pipe code works pretty well the way it is now.

Other patches and updates released this week include:

  • Neil Brown has posted an implementation of tree quotas for the ext2 filesystem. Tree quotas differ from ordinary disk quotas in that they are handled on a per-tree basis. All files contained within a particular directory tree are charged to the owner of the tree, regardless of who actually owns the files.

  • The Scalable Testing Platform is an automated test system for the Linux kernel produced at the OSDL.

  • Jens Axboe has posted a new version of his patch enabling DMA to high memory without bounce buffers.

  • A scheduler patch was posted by Davide Libenzi. The patch tries to get better cache efficiency by considering how long a process has been running on a particular CPU before moving it.

  • The latest PCI Hotplug driver is available courtesy of Greg Kroah-Hartman.

  • Worth a look: Martin Devera's graphical call graphs of both Linux VM implementations.

  • Here's the latest premptible kernel patch from Robert Love.

  • Version 0.2.1 of the IBM Enterprise Volume Management System has been announced by Kevin Corry.

  • The latest Stanford Checker run has identified numerous potential security bugs in the kernel resulting from inadequate checks on user-supplied parameters. Happily, the Checker team is not yet censoring its reports out of fear of the DMCA.

  • Martin Devera has announced a new version of his HTB queuing discipline module.

  • Vamsi Krishna has announced version 3.0 of IBM's Dynamic Probes kernel debugging system.

  • Keith Owens has released kdb v1.9 for the 2.4.13 kernel.

  • iptables 1.2.4 was released by Harald Welte.

Section Editor: Jonathan Corbet


October 25, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds