User-Mode Linux

July 25, 2000
J. Corbet

Jeff Dike gave a well-attended presentation on User-mode Linux - his port of Linux to itself. UML allows a Linux kernel to be run in user space, with all kinds of great advantages that brings; see the February 15, 2001 LWN Kernel Page for a description of what UML can do.

The most interesting part of the talk for many of us was the discussion of where UML will go in the future. Jeff has found himself the creator of a tool with some surprising and interesting possibilities; it will be interesting to see where it ends up.

Some of the immediate work is a bit more mundane, though: performance and feature completeness. For example, UML currently has a tracing thread that is slated for elimination; it currently causes four context switches for every system call made. The memory model also needs some work to cut down on all the mapping and unmapping that is required every time control switches to a different UML process. That may require some regular kernel support, with new primitives to allow the easy creation, population, and changing of address spaces.

There is also the need to complete the emulation of access to the hardware. Some utilities need it to run. UML would also be far more useful for the debugging of device drivers if it could provide restricted access to I/O ports. Progress is happening: access to I/O memory was added recently.

Finally, on the completeness side, it is still not possible to run UML within a UML system. There's not a whole lot of demand for that feature, but the ability to run nested UML systems would still be a nice milestone to reach.

A nice new feature is the copy-on-write (COW) block driver. This driver allows a UML kernel to boot on a shared, read-only filesystem; when it makes changes, the new data is stored on a separate, virtual device. If you are using UML to debug kernel changes, this feature can be very nice: when your bleeding-edge kernel crashes, just delete the COW device and start up a new UML with no filesystem checking whatsoever. If a COW device has changes that you want to keep, just use the uml_moo utility to merge them back in.

Then, there's the fancy new features that UML can provide. Take, for example, the "hostfs" file system. Its real purpose is to provide limited access to the host filesystem. But, once you have that, it's relatively easy to hook in new features - like the ability to reproduce filesystem writes to multiple destinations. UML thus becomes a tool for the management of mirrored filesystems.

Jeff has schemes in mind to add fancy new consoles for UML kernels. Soon there may be fancy graphical interfaces, the obligatory emacs interface, scriptable interfaces, and, perhaps, a way to control a UML kernel via IRC.

Another effort out there is porting UML to non-Linux hosts. It's a fun project, and it could be useful for people wanting to test applications on Linux without the trouble of actually having a Linux system around. There is a Windows port of UML that has been started, though it has some formidable technical obstacles to face and its completion date is unclear.

While UML got its start as a kernel debugging tool, there is, apparently, an increasing amount of interest in using UML for its virtual machine capabilities. A UML kernel functions as a sort of jail which strongly restricts the access of any application running there. As such, it's an ideal platform for running untrusted software or the implementation of honeypot systems.

There's a couple of things that need to be filled in, however, for UML to fill this role properly. For example, kernel memory is accessible to processes running under UML; a suitably nasty application could use that access to escape the jail. The uaccess macros (used by the kernel to check pointers passed in from user space) are incomplete as well.

UML still lacks the ability to simulate SMP systems. SMP would be a most useful feature, allowing the testing of kernel code in a multiprocessor environment even if the developer has no SMP hardware. SMP is "dead simple," says Jeff; expect it soon.

Once SMP is in place, the door is opened for an interesting concept: UML clusters. There is little that requires independent UML processors to be running on the same host. With some address space cleverness, UML processors could be distributed across a network. There are some minor difficulties, such as the fact that "performance will suck," due to the need to "fault" pages across the network. So it doesn't look like something people will be rushing out to run in the near future.

But, as Jeff points out, a UML cluster can be thought of as an extreme form of the NUMA (non-uniform memory access) hardware architecture. As Linux support for NUMA improves, UML performance should improve as well. Then you have a platform for the creation of clusters that allows for straightforward administrations, user-created clusters, or multiple clusters running on the same hardware.

The old saying says that there is no problem in computer science that can not be solved through the addition of another layer of indirection. UML is such a layer, and people are just beginning to figure out what kinds of problems it can solve. It will be interesting to see where this project - which remains a "spare time" volunteer effort on Jeff Dike's part - will end up.

Back to OLS 2001 coverage