[LWN Logo]

Date:	Wed, 4 Nov 1998 13:36:21 -0500
From:	"Theodore Y. Ts'o" <tytso@MIT.EDU>
To:	Greg Mildenhall <greg@networx.net.au>
Subject: Re: Volume Managers in Linux

   Date: Wed, 4 Nov 1998 15:17:40 +0800 (WST)
   From: Greg Mildenhall <greg@networx.net.au>

   On Tue, 3 Nov 1998, Theodore Y. Ts'o wrote:
   > Well, for filesystems that don't implement multiple device spanning,
   > they can use MD (or LVM) today.  But there are advantages to getting the
   > filesystem involved; the filesystem can more efficiently handle
   > placement issues, and it can more easily handle evacuating a disk used
   > in the middle of a logical volume.  

   How often do you do that? If anyone does it often enough to make it more
   important than a cleaner, easier-to-debug fs implementation, then I would
   be so rude as to suggest this is administrator error, not a kernel prob.

You have a filesystem which spans multiple disks, using all 7 SCSI disks
in the SCSI chain.  It's not RAID'ed, mainly due to cost constraints.
Whoops!  A disk in the middle of the filesystem starts generating soft
errors, and you know that means that you want to replace it before it
dies for good.  (Disks will oftengive you plenty of warning before they
head crash, if you know what to listen for.)

Suppose you have enough free space in this filesystem so you can afford
to lose one of your drives.  If the filesystem is involved, and knows
where all of the boundaries are, and the filesystem format allows for
discontiguous block addresses, or hierarchical (volume #/block #)
block addresses, then it's not a problem, you can run a program which
moves all of the blocks and inodes located on the failing disk to free space
on the other disks, and then replace the bad disk.

Now take the exact same situation using LVM.  Because the filesystem
isn't involved, it has no idea on which blocks are bad and which need
relocating due to the failing disk.  Worse, the filesystem is
assuming a contiguous block numbering scheme, and certain filesystem
structures such as the inode table have to be located at specific places
in the disk, so you can't completely evacuate the failing disk.  

At the LVM layer, you can move slices of the logical volume to free
slices located on other physical volumes --- but that assumes that you
can actually add an extra disk.  In this example, the SCSI chain is
full, so you can't add the extra disk.  (There may also be cost or
availability issues --- the extra disk might simply not be available
when you need it.)

You could try doing a filesystem resize operation, to free up slices at
the end of the logical volume --- but that will mean compacting files
into and doing a lot of block copying, possibly copying data into the
failing disk, which is not a smart move!

You can avoid these problems with the current LVM structure if you
require that you always have a free disk available which can be plugged
in, so that you have a place to sweep data from one failing physical
volume to another.  This may be an acceptable tradeoff to some, but it
points out that there *are* advantages to why you want to have the
filesystem involved handling the multiple volume case.

						- Ted

P.S.  I had earlier mentioned the Digital Unix's Advanced Filesystem and
DVD-ROM's UDF filesystem as filesystems which support multiple volumes
as part of their design.  I just realized I had omitted another very
important example --- Microsoft's NTFS v5.  (Of course, some might
believe that's a good reason to not follow that design choice, but the
point is that there plenty of examples of designs which get the
filesystem involved when handling multiple volumes.)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/