![[LWN Logo]](/images/lcorner.png) |
|
![[LWN.net]](/images/Included.png) |
From: willy@ldl.fc.hp.com (Matthew Wilcox)
To: linux-fsdevel@vger.kernel.org
Subject: File Locking in 2.5
Date: Mon, 30 Apr 2001 12:39:23 -0600
Cc: mikosh@us.ibm.com, dixonbp@us.ibm.com, tridge@samba.org
this is a dump of http://infradead.org/~willy/locking_manifesto.html
and i'll update that page as i get comments.
File Locking in Linux 2.5
The file locking code in Linux 2.4 has a number of problems I'd like
to address during 2.5 development. Here's a list:
* The code is in pretty desperate need of a rewrite. It's overly
complex and has accumulated some cruft over the years.
* It's too incestuous with lockd.
* It doesn't provide facilities needed by other networking /
clustered file systems
* A need for range-locks which aren't removed by close(dup(fd));
Here's a scheme which will hopefully address the above problems.
Feedback welcome.
Providing the right facilities for networked/clustered filesystems
* All filesystems will fill in their ->lock method.
* Local filesystems should all use local_lock() for this method,
unless they have a good reason to provide their own facilities.
* nfs (client) will provide a ->lock() method which performs an RPC
to the remote lockd. It does not call local_lock().
* lockd (on the server) calls the underlying fs' ->lock() method.
Note this is potentially recursive (ie we can reexport an NFS
filesystem and have locking work.)
Note that this clears the way for filesystems to provide non-POSIX
semantics (eg Netware, SMB OpLocks, etc). There is no requirement for
any filesystem to use the local_lock() function.
lockd has an interesting problem. The semantics of fcntl(F_SETLKW) are
that the process has to sleep until the lock is granted, or a signal
interrupts the sleep. Clearly it's incredibly inefficient for lockd to
spawn a new thread every time it wants to make a lock which would
block. So at first glance, we need a different type of lock -- put the
lock on the list of blocked locks, and return -EAGAIN (-EWOULDBLOCK?).
Then, when that lock is held, notify lockd that it now has the lock,
it can return that notification to the client and the client process
unblocks.
But what if we simply replace the blocking lock with the would-block
lock? That implies that the caller of ->lock() decides what to do with
the -EWOULDBLOCK return code -- if it's fcntl(), it puts the process
to sleep; if it's lockd, it just carries on.
A clustered filesystem might call out to the network and say `I want
to put this lock on this file'. Either some other node in the cluster
says `Denied', `Blocked' or `Granted' (ie handles the request), OR no
other node accepts responsibility for the lock, in which case we lock
it locally by calling local_lock().
lockd
lockd does the following to recover from a downed server:
for_each_lock
if (belongs_to_my_fs)
foo();
This requires it to have access to the global list of locks, which is
a bad thing to have anyway.
I've written some replacement code which Trond approved of:
for_each_inode(sb)
for_each_lock(inode)
foo();
With the changes above, even this code can go away. The nfs client can
keep a per-fs list of locks, and reestablish them at server restart.
No need to interact with the local locking at all.
Non-POSIX locks
We already provide five different lock types:
* 4.4 BSD flock() locks. These are whole-file locks which are
inherited across a fork(). They are not checked for deadlock.
* POSIX fcntl() locks. These are byte-range locks which are
inherited across an exec(), but not a fork(). They are checked for
deadlock. (A subtype of this is the LFS variant which allows for
64-bit offsets.)
* Leases. These are whole-file locks which are broken when another
process attempts to open the file for an operation which would
conflict with the lease type. When they are broken, the owner
receives a signal and must ensure the file is in a consistent
state before releasing their lease.
* Share modes. These are whole-file mandatory locks. No other
process may open a file which would conflict with the Share Mode
on the file. Use flock() with the %LOCK_MAND flag to set a Share
Mode.
* Mandatory Locks. These are byte-range mandatory locks. To use
them, mount the filesystem with the `mand' option enabled, and set
the file mode to g-x g+s. POSIX locks applied to this file will
now be mandatory. Mandatory locks do not prevent accesses via
mmap(). You should not use Mandatory locks in new code.
The proposal mentioned above would add a sixth -- whatever the
filesystem supports. Ncpfs already does this through an ioctl, but
that could be supported `natively' through this new scheme.
I want to add another byte-range lock, which looks and smells like a
POSIX fcntl lock except that it is not removed by closing any fd which
happens to be open on this file. Samba keeps a list of open fds which
are not currently in use on any locked file to work around this
stupidity in the spec. I'd like the external interface to this to be
fcntl(F_SETLK_NP) and F_SETLKW_NP. Clearly F_GETLK does not need to be
altered or replaced.
Restructuring
locks.c still runs almost entirely under the BKL. An earlier attempt
to move it to a different locking scheme was thwarted when the code
was integrated into 2.4.0-test9 while I was on holiday, and without me
submitting it to Linus. Grumble. I plan to move it to _one_ spinlock
to cover all lock-related structures, and I think that will be
possible with the plan described above (since this code will no longer
sleep).
As soon as lockd no longer needs to keep its fingers inside locks.c, I
want to remove the global list of locks. It's also used by /proc/locks
-- which probably needs to go away anyway. So what's useful about
/proc/locks? I'd like to be able to see which locks my process has,
and which processes have a lock on a given file. The former is easy --
/proc/$PID/locks can be constructed relatively easily from the fd's
open by that process. The latter? I don't know. Ideas welcome.
Links
[1]POSIX file locking
[2]Olaf Kirch's page on NLM (warning: out of date)
_________________________________________________________________
Matthew Wilcox <matthew@wil.cx>
References
1. http://www.opengroup.org/onlinepubs/007908799/xsh/fcntl.html
2. http://www.swb.de/personal/okir/lockd.html
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org