Sections: Main page Security Kernel Distributions On the Desktop Development Commerce Linux in the news Announcements Linux History Letters All in one big page See also: last week's Kernel page. |
Kernel developmentThe current kernel release remains 2.4.9. The current 2.4.10 prepatch is 2.4.10pre2, which contains a number of fixes and cleanups, but nothing too revolutionary. Alan Cox's current patch is 2.4.9ac3. This one contains everything from 2.4.9, with the exception of the min()/max() stuff and the virtual memory changes. The VM work, at least, is likely to find its way into the "ac" series slowly, in order to make it easier to assess the effects of each change individually. Also present in the 2.4.9ac series is a large merge of the MIPS port. Whither 2.5? Back on June 21, Linus said that the 2.5 series "will open in a week or two." That, of course, was more than two months ago. As of September 4, it will have been a full eight months since 2.4 came out. Never in ten years of Linux development has there been such a long period without a development kernel. This hiatus is making itself felt in a number of ways. One is that development items are finding their way into the "stable" kernel series. Back in January, Linus had laid out a stern policy on patches for 2.4: In order for a patch to be accepted, it needs to be accompanied by some pretty strong arguments for the fact that not only is it really fixing bugs, but that those bugs are _serious_ and can cause real problems. Patches since then have included API changes, wholesale driver replacements, zero-copy networking, and numerous other changes that would seem to have bent the above rule just a little bit. Meanwhile, much of the serious development work that is on tap for 2.5 remains isolated and untested, or not done at all. And developers are increasingly wondering when the 2.5 series will start. It is important, certainly, to hold the line on development work while the 2.4 kernel stabilizes. The developers need to maintain their focus on stability until the job is really done, and an open development series could easily distract many of them. But eight months is a long time without a development kernel. It seems time for 2.5 to start. The min() and max() issue... Linus returned from Finland and put in his contribution to the debate on the changes to min() and max(): Yes, the new Linux min/max macros are different from the ones people are used to. Yes, I expected a lot of flamage. And no, I don't care one whit. Unlike EVERY SINGLE other C version of min/max I've ever seen, the new Linux kernel versions at least have a fighting chance in hell of generating correct code. In other words, he does not intend to back down on this change, and people should just deal with it and get on with things. Most of the kernel hackers seem to have accepted this, with, perhaps, a final grumble or two and the discussion has died down. An open question, still, is what Alan Cox will do in the "ac" series. 2.4.9ac3 does not have "the min/max thing which needs to be dealt with." Last week Alan had said that he would not incorporate this change in this form - though he does agree with the basic goals of the change. This change, however, affects a lot of files throughout the kernel, and maintaining a kernel that differs from Linus's in this respect would be a lot of work for Alan and many other kernel developers. It would probably be much easier for everybody involved to just adopt Linus's new way of doing things and be done with it. Then, there was the well-intentioned guy who suggested supporting both the new and the old min/max macros, and surrounding each call with a #ifdef. That idea didn't get too far... In search of smart readahead. This week saw a complaint that disk read performance is very slow when numerous threads are all reading simultaneously. One suggestion that came out quickly was to increase the readahead limit for disk files. It's an approach that has worked for some people, but a more general solution requires a deeper look. "Readahead," of course, is the act of speculatively reading a file's contents beyond what a process has asked for, with the idea that the process will get around to asking for it soon. When properly done, readahead can greatly increase read performance on a system, and most operating systems implement the technique. A larger readahead limit can help performance by creating more contiguous I/O operations for the disk, and by making it easier to stay ahead of the reading process. So increasing the readahead size would seem like a fairly straightforward decision. Until, of course, you realize that readahead requires memory, and the system might just have one or two other possible uses for that memory. In fact, it can even be worse than that. As Rik van Riel points out, awful things can happen if the system tries to perform more readahead than it has memory for. When memory gets tight, pages used for readahead can be reclaimed for other purposes, with the result that the data so carefully read ahead gets dropped on the floor. When the reading process gets around to asking for that data, it has to be read from the disk again. In this mode, all readahead does is increase memory pressure and duplicate I/O operations; the system would be better off giving up on readahead. The solution, it would seem, would be to be smarter about just how much readahead is done. When lots of memory is available, the readahead window should be large; as memory gets tight that window should be reduced. There are several ideas on how smartness should be implemented, however.
Rik van Riel has stated his intention to proceed (with others) on an approach which dynamically scales the readahead window size "using heuristics not all that much different from TCP window scaling." Stay tuned for a patch.
Journaling filesystem performance comparison. Andrew Theurer (of IBM) has posted the results of a performance comparison between several Linux filesystems. The standard ext2 filesystem beat all of the journaling filesystems by a fair amount; JFS was the fastest among the journaling systems. ReiserFS came in last in this set of tests. It turns out that Randy Dunlap, too, has been testing journaling filesystems. He is using a different benchmarking tool, but has come up with roughly similar results. The ReiserFS testing, as it turns out, was done with a default mount option that reduces performance (but which saves disk space). People interested in performance in ReiserFS should mount with the -notails option. The above tests will be rerun with that option, but no results had been posted as of "press" time. Other patches and updates released this week include:
Section Editor: Jonathan Corbet |
August 30, 2001
| ||