[LWN Logo]
[LWN.net]
From:	 willy@ldl.fc.hp.com (Matthew Wilcox)
To:	 linux-fsdevel@vger.kernel.org
Subject: File Locking in 2.5
Date:	 Mon, 30 Apr 2001 12:39:23 -0600
Cc:	 mikosh@us.ibm.com, dixonbp@us.ibm.com, tridge@samba.org


this is a dump of http://infradead.org/~willy/locking_manifesto.html
and i'll update that page as i get comments.

                           File Locking in Linux 2.5
                                       
   The file locking code in Linux 2.4 has a number of problems I'd like
   to address during 2.5 development. Here's a list:
     * The code is in pretty desperate need of a rewrite. It's overly
       complex and has accumulated some cruft over the years.
     * It's too incestuous with lockd.
     * It doesn't provide facilities needed by other networking /
       clustered file systems
     * A need for range-locks which aren't removed by close(dup(fd));
       
   Here's a scheme which will hopefully address the above problems.
   Feedback welcome.
   
Providing the right facilities for networked/clustered filesystems

     * All filesystems will fill in their ->lock method.
     * Local filesystems should all use local_lock() for this method,
       unless they have a good reason to provide their own facilities.
     * nfs (client) will provide a ->lock() method which performs an RPC
       to the remote lockd. It does not call local_lock().
     * lockd (on the server) calls the underlying fs' ->lock() method.
       Note this is potentially recursive (ie we can reexport an NFS
       filesystem and have locking work.)
       
   Note that this clears the way for filesystems to provide non-POSIX
   semantics (eg Netware, SMB OpLocks, etc). There is no requirement for
   any filesystem to use the local_lock() function.
   
   lockd has an interesting problem. The semantics of fcntl(F_SETLKW) are
   that the process has to sleep until the lock is granted, or a signal
   interrupts the sleep. Clearly it's incredibly inefficient for lockd to
   spawn a new thread every time it wants to make a lock which would
   block. So at first glance, we need a different type of lock -- put the
   lock on the list of blocked locks, and return -EAGAIN (-EWOULDBLOCK?).
   Then, when that lock is held, notify lockd that it now has the lock,
   it can return that notification to the client and the client process
   unblocks.
   
   But what if we simply replace the blocking lock with the would-block
   lock? That implies that the caller of ->lock() decides what to do with
   the -EWOULDBLOCK return code -- if it's fcntl(), it puts the process
   to sleep; if it's lockd, it just carries on.
   
   A clustered filesystem might call out to the network and say `I want
   to put this lock on this file'. Either some other node in the cluster
   says `Denied', `Blocked' or `Granted' (ie handles the request), OR no
   other node accepts responsibility for the lock, in which case we lock
   it locally by calling local_lock().
   
lockd

   lockd does the following to recover from a downed server:
for_each_lock
        if (belongs_to_my_fs)
                foo();

   This requires it to have access to the global list of locks, which is
   a bad thing to have anyway.
   
   I've written some replacement code which Trond approved of:
for_each_inode(sb)
        for_each_lock(inode)
                foo();

   With the changes above, even this code can go away. The nfs client can
   keep a per-fs list of locks, and reestablish them at server restart.
   No need to interact with the local locking at all.
   
Non-POSIX locks

   We already provide five different lock types:
     * 4.4 BSD flock() locks. These are whole-file locks which are
       inherited across a fork(). They are not checked for deadlock.
     * POSIX fcntl() locks. These are byte-range locks which are
       inherited across an exec(), but not a fork(). They are checked for
       deadlock. (A subtype of this is the LFS variant which allows for
       64-bit offsets.)
     * Leases. These are whole-file locks which are broken when another
       process attempts to open the file for an operation which would
       conflict with the lease type. When they are broken, the owner
       receives a signal and must ensure the file is in a consistent
       state before releasing their lease.
     * Share modes. These are whole-file mandatory locks. No other
       process may open a file which would conflict with the Share Mode
       on the file. Use flock() with the %LOCK_MAND flag to set a Share
       Mode.
     * Mandatory Locks. These are byte-range mandatory locks. To use
       them, mount the filesystem with the `mand' option enabled, and set
       the file mode to g-x g+s. POSIX locks applied to this file will
       now be mandatory. Mandatory locks do not prevent accesses via
       mmap(). You should not use Mandatory locks in new code.
       
   The proposal mentioned above would add a sixth -- whatever the
   filesystem supports. Ncpfs already does this through an ioctl, but
   that could be supported `natively' through this new scheme.
   
   I want to add another byte-range lock, which looks and smells like a
   POSIX fcntl lock except that it is not removed by closing any fd which
   happens to be open on this file. Samba keeps a list of open fds which
   are not currently in use on any locked file to work around this
   stupidity in the spec. I'd like the external interface to this to be
   fcntl(F_SETLK_NP) and F_SETLKW_NP. Clearly F_GETLK does not need to be
   altered or replaced.
   
Restructuring

   locks.c still runs almost entirely under the BKL. An earlier attempt
   to move it to a different locking scheme was thwarted when the code
   was integrated into 2.4.0-test9 while I was on holiday, and without me
   submitting it to Linus. Grumble. I plan to move it to _one_ spinlock
   to cover all lock-related structures, and I think that will be
   possible with the plan described above (since this code will no longer
   sleep).
   
   As soon as lockd no longer needs to keep its fingers inside locks.c, I
   want to remove the global list of locks. It's also used by /proc/locks
   -- which probably needs to go away anyway. So what's useful about
   /proc/locks? I'd like to be able to see which locks my process has,
   and which processes have a lock on a given file. The former is easy --
   /proc/$PID/locks can be constructed relatively easily from the fd's
   open by that process. The latter? I don't know. Ideas welcome.
   
Links

   [1]POSIX file locking 
   [2]Olaf Kirch's page on NLM (warning: out of date)
     _________________________________________________________________
   
   
    Matthew Wilcox <matthew@wil.cx>

References

   1. http://www.opengroup.org/onlinepubs/007908799/xsh/fcntl.html
   2. http://www.swb.de/personal/okir/lockd.html
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org