[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 Development
 Commerce
 Linux in the news
 Announcements
 Linux History
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current kernel release is 2.4.14, which was released on November 5. The 2.4.14 patches are all oriented toward stability, with the notion of, finally, producing a 2.4 kernel that really works for everybody. So there's no surprising changes to be found therein...

...almost. Linus has said with the 2.4.14-pre8 release that he would add no major changes before the final release, "per popular demand." Nonetheless, a last-minute tweak went in that broke the loopback driver; a small patch must be applied before the kernel will build properly.

Alan Cox's latest is 2.4.13-ac7, released on November 3. It contains a number of fixes and updates, including a large IDE driver update. Alan is, for now, working on merging his changes into 2.4.14 rather than pushing forward the "ac" series.

On the 2.2 front, Alan has released 2.2.20-pre12.

The path to 2.5 and the resolution of the VM divergence are starting to look a little clearer, thanks to some postings from Linus and Alan.

On 2.5: here's the latest from Linus:

My not-so-cunning plan is actually to try to figure out the big problems now, then release a reasonable 2.4.14, and then just stop for a while, refusing to take new features.

Then, 2.4.15 would be the point where I start 2.5.x, and where Alan gets to do whatever he wants to do with 2.4.x. Including, of course, just reverting all my and Andrea's VM changes ;)

There are, of course, those who would argue that Linus should have stopped taking new features a year or so ago...better late than never.

On the stability front, it is beginning to look like 2.4 is getting there. With the exception of the build problem mentioned above, there have been few complaints about this kernel. Stability, perhaps, is at hand.

Interestingly, it appears that Alan Cox will not be the maintainer of 2.4 once Linus moves on. In a posting to Advogato, Alan states that the 2.4 mantle will be passed to Marcelo Tosatti, who has been active in the kernel maintenance area for some time. Alan, instead, will adopt a slightly lower profile and be "spending more time concentrating on Red Hat customer related needs." Alan has always had a strong sense of what people area really using Linux for, and kernel development has benefitted from that. Refreshing his view into user needs is probably a useful thing for him to do.

Regarding the virtual memory subsystem: the 2.4 kernel will almost certainly stay with the new Arcangeli implementation. This is not a particularly surprising conclusion at this point. All along, there has been very little criticism of Andrea's implementation - though there is a persistent, low-level grumbling that some more documentation would be nice. Even Alan, while not including it into his "ac" series, has never claimed that the new VM was poorly done.

The complaining, instead, has been about process: many people were simply amazed that Linus would completely replace such a fundamental component in the middle of a stable kernel series. Even Linus, while defending his choices, has acknowledged that concern:

2.5.x will obviously use the new VM regardless, and I actually believe that the new VM simply is better. I think that Alan will see the light eventually, but at the same time I clearly admit that Alan was right on a stability front for the last month or two ;)

In retrospect, the real mistake seems easy to pick out: 2.4.0 should never have been released without a rock-solid VM implementation. Even if the 2.4.0 VM implementation could have been fixed with further work (and the "ac" series was making serious progress in that regard), that degree of fixing should not have been necessary. With luck, some of the lessons that have been learned here will be applied during 2.5 development.

Authoritative hooks: permission denied. The security module patch has been under development for six months or so; its purpose is to create a standard framework for the addition of security code to the kernel. The NSA's SELinux distribution has already been reworked to use this patch. It is generally considered to be in a ready state, waiting only for the 2.5 series to start before it is proposed for inclusion into the kernel.

Until recently, however, there has been one outstanding issue: authoritative hooks. The security module patch allows modules to hook into almost any operation performed by the kernel and make security decisions. But those decisions are all restrictive: a security module can only exercise its power by vetoing an operation that would, otherwise, have been allowed. Security modules, thus, can only make security policies tighter.

There is a patch out there, however, which would add "authoritative" hooks. An authoritative hook has the ability to give a process credentials and access that it would not otherwise have had. Many security policies can be implemented without authoritative hooks, but others cannot. Access control lists (ACLs) are an example of a security mechanism requiring authoritative hooks: an ACL can grant access to a file that would otherwise be denied by the standard permission bits. If a security module can not override those bits, via an authoritative hook, then it can not implement ACLs.

The debate over authoritative hooks has simmered on the security module list for some time. This week, it reached a conclusion of sorts when it was decreed that authoritative hooks would not be incorporated into the security module patch before that patch is submitted for inclusion in 2.5. There are various reasons for this decision, but they boil down to:

  • A security module patch implementing only restrictive hooks is far less likely to introduce security problems of its own. If security modules can increase privileges, there is a lot more latitude for mistakes that open up vulnerabilities.

  • The security module developers fear that the inclusion of authoritative hooks will make it less likely that the security module patch will be accepted into 2.5.
The door remains open for authoritative hooks sometime in the future, after the basic security module patch is part of the mainline kernel. For now, though, they will be left out.

Of course, not everybody is happy with this decision. In particular, a couple of developers from SGI (who are working on an ACL patch) have made it clear that they think the decision is wrong:

It is our position that the LSM group has decided to compromise the product in order to make the sale. We believe this is poor practice from both political and technical directions.

The authoritative hook developers worry that compatibility issues will prevent the patch's inclusion in the future. The patch changes the security module interface, and, if included later, will break existing modules. Over the course of 2.5 development, however, the kernel developers may be more than willing to pay that price if authoritative hooks seem worthwhile.

Fixing up /proc. It all started with a posting from Rusty Russell giving a proposal (and patch) for a new /proc implementation. Rusty's patch is aimed mostly at the kernel interface to /proc - what is required for code in the kernel to export an interface via that filesystem. It is indeed true that the current /proc API, though much improved over earlier versions, is unwieldy to work with and requires a lot of supporting code. The proposed replacement simplifies that interface greatly, to the point of requiring a single line of code for a module that wishes to export a simple variable via /proc.

The new API drew a few comments, but most people seemed to not be particularly concerned about it. Almost nobody is attached to the current way of doing things, after all. On the other hand, everybody seems to have ideas about how to change the other side of the interface: how /proc appears to user space.

There is a great deal of frustration with the current /proc. There is no standard for files in that directory: how they are named, what they contain, how they are formatted, etc. As a result, /proc is messy and inconsistent, and it is difficult to write applications that work well with it. The format of /proc files has also tended to change unpredictably over time, adding compatibility headaches for application writers.

So, the kernel list has seen a substantial discussion on what a new /proc should look like on the application side. At a minimum, people would like to see a set of defined standards for what goes in that directory. A set of informal standards already exists: /proc is supposed to be moving toward a scheme where files are in structured subdirectories, and each file contains exactly one value. Others would like to be far more formal, however; see postings by Kai Henningsen and Stephen Satchell which attempt to nail down how each /proc file should look.

Then again, if you want to impose a format on /proc, why not make it fully buzzword compliant and use XML? The xmlprocfs project has done just that, providing an implementation with enough angle brackets for everyone. See, for example, the XML version of /proc/devices for an example of how it looks. There are numerous XML supporters out there, but most kernel developers seem to think that XML is overkill.

Another contingent thinks that the fundamental idea behind /proc - human-readable, ASCII data - is incorrect. Instead, they would make /proc files into binary data, one value per file. The argument behind this approach is that eliminates the need for the kernel to ASCII encode everything and the need for applications to have decoders. Why not just pass the data directly? All that overhead will be eliminated, as well the hassles of keeping up with unstable /proc formats and the ongoing potential for buffer overflow problems.

Daniel Phillips did some profiling, and found that, when "top" is running, the kernel spends a significant amount of its time using sprintf() to encode values. So it appears that the cost of an ASCII /proc is worth thinking about.

Nonetheless, a binary /proc will not be making an appearance anytime soon; Linus has made that clear:

In short: /proc is ASCII, and will so remain while I maintain a kernel. Anything else is stupid.

Chances are, actually, that /proc in 2.5 will look very similar to what users seen now in 2.4. Massive changes in that interface are not only controversial; they also are guaranteed to break no end of applications. People really don't like it when their programs break. The kernel developers may not hesitate to break internal interfaces, but they are far more careful about causing problems in user space. That inertia alone is likely to ensure that massive /proc changes don't happen anytime soon.

A new devfs. The devfs device filesystem has never been an uncontroversial development. Kernel hackers argued for years over whether it should find a place in the mainline kernel; even after Linus settled that issue, the debate has gone on. In more recent times, however, the focus has been on the quality of the code - or the lack thereof. Even devfs author Richard Gooch has admitted that devfs, as it appears in current kernels, has substantial problems with race conditions and holes.

Some developers have been strongly critical of Richard for allowing these problems to persist. Richard has not helped the situation by being more focused on adding new features than fixing bugs.

With luck, this chapter, too, may be coming to a close. Richard has posted a new devfs core implementation which works enough reference counting and locking into the code to, one hopes, eliminate the problems. The changes are large, and Richard is not currently presenting the code as being ready for production. Nonetheless, he's more than interested in hearing about problems that brave testers might find. Once this code seems solid, it will be sent to Linus as a replacement for the existing devfs code in 2.4. (The most recent version of this patch, as of this writing, is here).

Other patches and updates released this week include:

  • Nathan Scott has posted a proposal for an extended attributes API. Among other things, this API allows access control lists for both XFS and ext2 to work with the same user space interface.

  • Speaking of ACL's, Andreas Gruenbacher has released version 0.7.22 of the ext2 ACL implementation.

  • A design document for the ReiserFS v4 transaction subsystem, written by Joshua MacDonald and Hans Reiser, is now available.

  • Version 0.9.0beta9 of the ALSA sound driver system is available.

  • Pavel Machek has posted a new swsusp patch for the 2.4.13 kernel. (swsusp will suspend a running Linux system to disk).

  • User-mode Linux 0.50-2.4.13 has been released by Jeff Dike.

  • Keith Owens has released version 1.5 of the new kernel build mechanism.

  • The High Resolution Timers project has released the first version of its patch.

  • The latest preemptible kernel patch is available from Robert Love. Among other things, this patch now supports the ARM architecture.

  • Derek Glidden has announced an analysis of the two 2.4 VM implementations, along with the 2.2 version. He concludes that some work remains to be done, but both 2.4 versions are better than what was available in 2.2. Also, a look at the swap performance of the 2.4 VM implementations has been made available by "safemode."

  • Daniel Phillips has gotten back to his ext2 directory index patch and released a new patch. "Still for use on test partitions only." He has also posted a set of test results with the new patch which show some nice performance improvements.

  • William Irwin has posted a new, bitmap-based boot-time memory allocator.

  • Version 0.2.2 of the enterprise volume management system has been released by Kevin Corry.

  • kdb v1.9 for 2.4.14 has been announced by Keith Owens.

  • Andrew Morton has released version 0.9.15 of the ext3 filesystem for the 2.4.14 kernel.

  • The November 5 security module patch is available; included now is a pair of example modules.

  • Version 1.4g of the loop-AES filesystem encryption patch is available.

  • The first release from the OpenGFS project - version 0.0.91 - has been announced.

Section Editor: Jonathan Corbet


November 8, 2001

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2001 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds