The Ottawa Linux Symposium 2000Herein is our coverage of OLS. There is far more going on here than can be handled by one person, so it's far from complete. Nonetheless, we hope to capture some of the flavor of the event.
Jump directly to:
Wednesday at OLS
The Ottawa Linux Symposium got off to a bit of a rough start. After sending around a note encouraging everybody to show up early for registration, they managed to get going an hour and a half late. In the end, they had to tell people to go to Miguel's noon keynote without badges, because otherwise they couldn't get through the lines in time. Oh well.
The conference organizers had a great idea, however: set up a wireless LAN in the convention center and Les Suites hotel, and lend out PCMCIA cards to the first hundred or so attendees. Too bad you have to load a separate driver, and that the system doesn't appear to be working all that well yet. The congress center was full of people glaring at their laptops in a puzzled way. The idea is nice, anyway...
Miguel showed up to give his keynote, only to note that nobody had gotten around to setting up a projector for his slides. At least the ensuing panic and delay allowed them to get more people through the registration process. (See the discussion of Miguel's talk in the July 20 LWN).
Theodore Ts'o started the day off with an overview of what's happening with Linux filesystems. Of course, he had to begin with half an hour of messing around with the video projector and the sound system before he could get going... Something in the Congress Center building is highly inimical to radio waves, to the detriment of both the wireless networking and the wireless microphones the conference is trying to use. Ted ended up simply yelling at the audience.
Filesystems, in the Ts'o view of the world, are divided up into three broad categories.
Another variant of interest is log-structured filesystems. These essentially treat the disk as a big circular buffer; they never overwrite blocks, all operations are, instead, replacements. Writes are fast because they always happen at the end of the buffer, and consistency is pretty much guaranteed. Log-structured filesystems have problems, however, in read performance and in their need for a "garbage collection" pass to keep space free at the end of the buffer area.
Another interesting sub-area is filesystems for flash memory. Flash brings its own constraints: it is expensive and thus small, and it can only be written a finite number of times. The limit on writing means that usage needs to be spread out over an entire flash array to avoid losing pieces too early. On the other hand, flash memory has no seek delays, so fragmentation is not a problem.
A couple of filesystems exist for flash memory now. "cramfs" is a compressed filesystem added recently to the 2.3 kernel development series. JFFS is a log-structured filesystem. Log-structured systems automatically spread usage over an entire device, and are thus well suited to flash memory.
Ted started out by saying that filesystems are expected to provide top performance for everybody. At this point, however, it also becomes clear that there are uses for application-specific filesystems. As a result, don't expect Linux to have fewer filesystems anytime soon - diversity helps Linux to be useful in many different situations.
Time was also taken out to address the issue of multi-stream files - those which, like Macintosh files, have more than one "fork" of data. Miguel de Icaza, on Wednesday, advocated support for such files; Ted Ts'o thinks it is a bad idea. None of the currently-used network protocols support multi-stream files, neither do standard Linux tools. Any new API for multi-stream files would be nonstandard, since POSIX has no concept of such files. While Miguel states that the protocols and utilities simply need to be fixed, Ted says that the same functionality can be obtained in other ways, and that developers should "think different" to get the results they need.
Stephen Tweedie presented the work that he has done with the ext3 filesystem. In an (until now) unheard-of move, he actually started the talk on time, without difficulties...
Ext3 is the time-tested ext2 filesystem with the addition of journaling capabilities. Journaling works by bunching up the changes to a filesystem into atomic transactions; either all of the changes actually happen, or none of them do. For example, the simple task of writing some data to a file can involve numerous steps:
In a journaling filesystem, all of those operations are first written to a special journal file, followed by a special "commit" record. Only when that has been done is the filesystem itself touched. If the system goes down in the middle, the whole thing is replayed from the journal file, so everything gets done. fsck becomes a thing of the past.
Stephen's goal with this work was to add the journaling capability to ext2 in a minimal fashion. He guarantees a 100% consistent filesystem at boot time, no matter what happens. But many other things that could go into ext2, such as b-trees, extent mapping, etc., have been left out in the name of simplicity.
The architecture used actually separates journaling out from the filesystem itself. A new journaling layer actually handles the details of making journaling work; it exports the capability to any module that needs it. So the addition of journaling to other filesystems should be relatively easy, if the interest is there.
For now, the journaling code writes everything to the journal file - including actual data written to files. In the long term, a more efficient implementation will journal only the file metadata. As long as the filesystem takes pains to write user data to the file before the metadata, there is no need to journal the user data. That, obviously, will greatly decrease the amount of data that has to go through the journal, and help to fix the "poor" (Stephen's word) write performance of ext3 now.
The ext3 journaling implementation is stable now, and working on a number of systems. There are still some user interface issues to deal with - some of the utilities need some work, and it's possible for an overzealous system administrator to delete the journal file. Those sort of issues, along with a port to 2.4, will be dealt with in the future.
Steve Best of IBM finished out the afternoon with a discussion of the port of JFS to Linux. JFS is IBM's journaling filesystem, the release of which was announced at LinuxWorld back in February. Since then, nine "code drops" have been done, with successively higher levels of functionality.
JFS boasts some impressive features. File sizes can go up to 4 petabytes. Its journaling is metadata-only and thus, hopefully, fast. JFS is also an extent-based filesystem, which can make things much faster for large files. Dynamic allocation of inodes is done, adding flexibility while reducing the space wasted by static inode tables. And the filesystem can be defragmented and resized while online. There are still a few difficulties, though. The version of the filesystem they are porting comes from OS/2, and thus, for example, still does not understand case-sensitive filenames. (It retains the OS/2 disk structure, making filesystems portable between Linux and OS/2).
JFS's journal recovery is performed by a user-space program ("logredo"). Some members of the audience made an issue of the fact that it will be hard to use JFS as the root filesystem, since it must be mounted (to find logredo) before the journal has been replayed. The approach to that problem is to "mount read-only and hope for the best" until logredo can be run. Some people were unimpressed by this (ext3 does log recovery in the kernel, and does not have this problem), but the fact is that ext2 filesystems work that way now.
It was asked whether JFS might ever use the independent journaling layer that has been implemented for ext3. "Not now" is the answer - they want to complete their port before getting into large architectural changes.
The wireless networking is working at this point. It's just a matter of fixing a couple of things in the scripts and being in the right place. As it turns out, the rooms where the talks are held do not qualify as a "right place," frustrating all those who want to play around on the net while waiting for the speaker to say something interesting. Tip for future conference organizers: wireless networks are an absolutely great idea, but don't forget the technical support side of things. Even a "networking hints" bulletin board would have helped a lot of people get going.
This morning Richard Gooch had scheduled a 10:00 BOF session on devfs. At about 10:10, the conference folks got around to setting up a room and putting up a sign. As a result, there were all of three of us there. Nonetheless, it was a fun conversation. Mr. Gooch has gotten past a critical hurdle in getting devfs into the kernel (after more than two years of effort), but there are still many people who oppose its existence. The devfs wars are not over yet.
At this point, the long-term fate of devfs probably rests in the hands of the distributors. If devfs starts cropping up in some high-profile distributions, it will be used by default. Thus SGI, which is sponsoring work on devfs, is said to be pushing some of the distributors to go in that direction. One large distributor, MandrakeSoft, is said to be seriously considering enabling devfs in its standard kernel.
Back in the main conference program, Deepak Saxena gave a presentation on the I2O bus and the status of its support under Linux. I2O was the subject of some concern in the Linux community a couple of years ago, due to the fact that its specification was only available under NDA. Those days are over now, of course, with the specification being openly available on the I2O web site and Intel supporting Linux I2O development.
I2O is driven by the problem that intensive I/O loads can swamp even modern processors. Driving a gigabyte ethernet card, for example, can easily reduce a system to servicing interrupts and doing nothing else. The I2O approach is to place another CPU (the I/O Processor or IOP) between the main processor and the devices; this CPU offloads as much of the I/O processing load as possible.
There is nothing new about this sort of architecture - your author learned to program (far too many) years ago on a Control Data Cyber mainframe running KRONOS that was organized in just this way. When the I/O load gets too intense, it's time to throw another computer at the problem.
Anyway, the I/O processor now needs to know the details of driving the specific peripherals which are on the I2O bus. Perhaps the best feature of I2O, in the end, is that it has defined how the various types of cards are supposed to operate. Network cards, for example, have a specific interface that they must implement. Suddenly there is no more need for many dozens of ethernet drivers - a single driver will suffice.
All I/O is done through communication with the IOP. A message-passing scheme is used, which has good and bad points. On the good side, many operations can be performed with a single message in each direction, greatly reducing the interrupt load on the main processor. This scheme also tends to increase latencies, however. Certain applications, such as network communications with lots of short packets, can suffer from this latency increase.
The core I2O implementation for Linux exists now. The block storage device driver works, and the system can boot from an I2O disk. The LAN device driver works as well - it gets similar throughput on high-speed networks as the regular PCI drivers, but with significantly lower CPU overhead. There was no specific mention of other types of drivers, such as sequential storage.
For the future, look for higher-level operations to be split off onto the IOP. For example, the "socket" device class will implement an entire TCP/IP stack, taking all the protocol overhead out of the main processor. There is also an interest in implementing direct device-to-device transfers, allowing, for example, static files to be served to the web without involving the CPU.
The majority of us stuck with non-I2O systems were, instead, the topic of Jes Sorensen's talk on the optimization of SMP device drivers. This talk was a distillation of his experience with the AceNIC gigabit ethernet driver; it went from having real performance problems to being able to blast out some serious data.
The techniques involved are highly detailed, and probably not of great interest to those who don't hack device drivers. They include:
Lars Marowsky-Brée gave a session on SuSE's work porting FailSafe, a high-availability system being open-sourced by SGI, to Linux. He took some time to distinguish this offering from what a number of other Linux vendors are doing: FailSafe is not "just another two-node solution." He was a little critical of the number of vendors who are reimplementing high availability systems so that they can have their own entry in the market. SuSE, instead, is going with code that scales up to 16 nodes, and which has been out there and working for five years.
FailSafe itself deals only with high availability. Thus it does not, for example, handle load balancing. It's job is to keep track of a set of network nodes and the services running on them. If something stops working, FailSafe will find a new place for it to run.
The system is very application oriented - it can be set up to do things like moving a large Oracle server from one node to another. It is also implemented entirely in user space, perhaps making it unique among high availability systems. No kernel additions are required.
FailSafe will be released in August (LinuxWorld "might make a good time" for the announcement). The code - 350,000 lines worth - will be licensed under the GPL, with the LGPL applied to some libraries that application developers can use. Meanwhile, binary snapshots are actually available now.
Despite the kernel-heavy nature of this conference, there is interest in the applications side as well. Dan Winship gave a presentation on Evolution, Helix's mail/address book/calendaring system. It is, according to Dan, "the most buzzword-compliant application for Linux." But it's something that we want to have anyway.
The core of the application remains the mail agent, where people spend a lot of time. It has a great many features, including the ability to display and send HTML-formatted mail. It tries, says Dan, to be "well behaved" about sending HTML mail. It can deal with local, POP, and IMAP mail now; there is interest in writing other back ends to allow, for example, pulling mail from services like Hotmail while stripping out the advertisements.
The "vFolder" scheme is intended to be a more flexible way of handling mail folders. Instead of setting up physical folders, Evolution throws everything into one big database and lets users overlay folders in any way they want. Thus it's easy to add a folder like "messages from Miguel." The user can then go on and make another one called "unread messages from Miguel containing the word 'sucks'". That folder, according to Dan, would be one of the larger ones...
Evolution stresses the integration between its components. Thus the address book is available for mail composition; it can also snarf "vcards" out of incoming mail to add new entries. Similar tricks are available with the calendar, making things like meeting scheduling easier. The calendar can also be used to mark messages as "reply to this later." If you've not actually made the reply within the allotted time window, you'll get an alarm to remind you.
Evolution exports it components for other systems to use. Thus it's easy to write scripts that work with the address book, for example. What Evolution will not do ("we're not stupid") is run scripts that come in via mail. Someday there will be the ability to add more font ends to the system - examples include web-based, text, and the obligatory emacs interface.
Other upcoming goodies include the ability to stick notes onto messages - that is apparently in the code now, though it has to be explicitly enabled in the build process. Project management features are on the list. They also want to integrate the "gnomacs" emacs component for mail composition. Finally, integration of encryption (PGP and GPG) ison the list - but not there yet.
David Miller's keynote was scheduled at 3:15 as the last event. Around 3:20 or so, the conference staff got around to setting up a video projector and the sound system... The lack of the projector, at least, can be understood, since David projected his talk from slides. But Alan Cox had a special introduction in mind...the very first mail message he ever got from David. It was a "subscribe" message sent to the linux-multicast mailing list - not to the request address...
David's talk was a history of the Linux kernel from a David-centric point of view. It was a lively, anecdote-laden presentation that would be impossible to reproduce here. So I'll content myself with a few moments in history:
Thus ends the Ottawa Linux Showcase, except for the Helix Code party to be held tonight. This has been a top-quality event - despite my occasional pokes at the conference organization. This event is tightly focused on the code, with a near absence of hype. LinuxWorld-like events are good for what they are, but it is at events like OLS that the Linux community really can be seen.
I'll be back next year.
Eklektix, Inc. all rights
Linux ® is a registered trademark of Linus Torvalds