Date: Wed, 4 Nov 1998 13:36:21 -0500 From: "Theodore Y. Ts'o" <tytso@MIT.EDU> To: Greg Mildenhall <greg@networx.net.au> Subject: Re: Volume Managers in Linux Date: Wed, 4 Nov 1998 15:17:40 +0800 (WST) From: Greg Mildenhall <greg@networx.net.au> On Tue, 3 Nov 1998, Theodore Y. Ts'o wrote: > Well, for filesystems that don't implement multiple device spanning, > they can use MD (or LVM) today. But there are advantages to getting the > filesystem involved; the filesystem can more efficiently handle > placement issues, and it can more easily handle evacuating a disk used > in the middle of a logical volume. How often do you do that? If anyone does it often enough to make it more important than a cleaner, easier-to-debug fs implementation, then I would be so rude as to suggest this is administrator error, not a kernel prob. You have a filesystem which spans multiple disks, using all 7 SCSI disks in the SCSI chain. It's not RAID'ed, mainly due to cost constraints. Whoops! A disk in the middle of the filesystem starts generating soft errors, and you know that means that you want to replace it before it dies for good. (Disks will oftengive you plenty of warning before they head crash, if you know what to listen for.) Suppose you have enough free space in this filesystem so you can afford to lose one of your drives. If the filesystem is involved, and knows where all of the boundaries are, and the filesystem format allows for discontiguous block addresses, or hierarchical (volume #/block #) block addresses, then it's not a problem, you can run a program which moves all of the blocks and inodes located on the failing disk to free space on the other disks, and then replace the bad disk. Now take the exact same situation using LVM. Because the filesystem isn't involved, it has no idea on which blocks are bad and which need relocating due to the failing disk. Worse, the filesystem is assuming a contiguous block numbering scheme, and certain filesystem structures such as the inode table have to be located at specific places in the disk, so you can't completely evacuate the failing disk. At the LVM layer, you can move slices of the logical volume to free slices located on other physical volumes --- but that assumes that you can actually add an extra disk. In this example, the SCSI chain is full, so you can't add the extra disk. (There may also be cost or availability issues --- the extra disk might simply not be available when you need it.) You could try doing a filesystem resize operation, to free up slices at the end of the logical volume --- but that will mean compacting files into and doing a lot of block copying, possibly copying data into the failing disk, which is not a smart move! You can avoid these problems with the current LVM structure if you require that you always have a free disk available which can be plugged in, so that you have a place to sweep data from one failing physical volume to another. This may be an acceptable tradeoff to some, but it points out that there *are* advantages to why you want to have the filesystem involved handling the multiple volume case. - Ted P.S. I had earlier mentioned the Digital Unix's Advanced Filesystem and DVD-ROM's UDF filesystem as filesystems which support multiple volumes as part of their design. I just realized I had omitted another very important example --- Microsoft's NTFS v5. (Of course, some might believe that's a good reason to not follow that design choice, but the point is that there plenty of examples of designs which get the filesystem involved when handling multiple volumes.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/