[LWN Logo]
[Timeline]
Date:	Sun, 29 Oct 2000 15:15:57 +0100 (CET)
From:	Andreas Gruenbacher <ag@bestbits.at>
To:	"Stephen C. Tweedie" <sct@redhat.com>,
Subject: [PROPOSAL2] Extended Attributes

Hi again,


There were some good arguments for adding a few more features to the EA
interface. This new proposal reflects some of the discussion.

I still decline to support forks/streams through the EA interface. IMHO
that's just the wrong way to go. This doesn't preclude an EA
implementation on top of streams, of course.

The interface described here also doesn't include Stephen's idea to allow
an ordered list of EA's under the same name. In addition to the append and
prepend operations Stephen suggested, a whole range of other operations
(get/delete/... by index, etc.) might make sense, and stuff like that
could well be added. However, it would complicate the semantics even
further. I'd really like to learn more about the requirements for that.

Stephen, do you have any good pointers?

We have also been discussing how to support different EA namespaces.
Stephen's approach was to use an integer namespace id to specify the
namespace, while my approach was to use a textual prefix to the EA name.
While those approaches are semantically equivalent, I have been convinced
that an integer specifier is easier to handle in the kernel.

Still, I believe in textual names at the user interface. I think the id's
should be translated from/into textual names in a userspace library before
presenting them to users.

One of the issues raised was that it's important to be able to manipulate
multiple EA's at once. The reason for this was to reduce system call
overhead.

Another idea was to allow manipulation of multiple EA's in an atomic way.
If I recall correctly an even stronger semantic requirement of
manipulating multiple EA's in a transactional way was also suggested.

Another issue was that the current proposal at <http://acl.bestbits.at>
has a race condition between GET and SET operations. This is also
addressed here.

Meanwhile I did a little background reading on NFSv4 since some complained
the ACL implementation were too limited. I will not address that here, but
the NFSv4 spec gave me a new idea of how the EA interface could support
all of the above in a clean, simple and extensible way. NFSv4 supports
compound operations, in which multiple requests are packed into a single
RPC. A similar approach might also make sense for the EA interface.

Note that the interface proposed here is comparable to Tru64's property
lists interface (although it goes beyond that). The Tru64 proplist(4)
manual page is here: <http://www.tru64unix.compaq.com/
faqs/publications/base_doc/DOCUMENTATION/V50_HTML/MAN/MAN4/0200____.HTM>


I could imagine the system call(s) to be implemented like this:

int sys_ext_attr_file(char *path, int namespace, int flags,
                      struct ea_request *request, size_t request_len,
                      int *results, size_t result_size);

int sys_ext_attr_fd(int fd, int namespace, int flags,
                      struct ea_request *request, size_t request_len,
                      int *results, size_t result_size);

(This doesn't actually work as system calls as is because there are too
many parameters.)

Multiple EA operations are marshalled into the reuest buffer; after the
system call the results buffer contains the results. Operations are
encoded in the request buffer as variable-size records with this
structure:

struct ea_request {
	int	operation;
	/* additional operation specific fields */
};

Results just consist of one integer status code per operation.

Operation could be one of:

  EA_REQ_LIST
    List the names of all EA's defined for this inode.
  EA_REQ_GET
    Get the value of an EA.
  EA_REQ_GETSIZE
    Get the buffer size required for storing the value of an EA.
  EA_REQ_SET
    Set the value of an EA to a new value.
  EA_REQ_REMOVE
    Remove an EA.

  EA_REQ_VERIFY
    Compare the current EA value with the value passed.
  EA_REQ_GET_COOKIE
    Get a cookie that corresponds to the current EA state.
  EA_REQ_VERIFY_COOKIE
    Compare an inode's current cookie with the cookie passed.

  (more on the last three below)

For some requests/results, additional parameters are needed:

struct ea_req_list {
	int	operation = EA_REQ_LIST;
	size_t	buffer_size;
	struct ea_entry	*entries;
	size_t	offset;
};

struct ea_req_get {
	int	operation = EA_REQ_GET;
	int	op_flags;
	size_t	*buffer_size;
	char	*buffer;
	unsigned short name_len;
	char	name[];  /* size padded to machine word size */
};

struct ea_req_set {
	int	operation = EA_REQ_SET;
	int	op_flags;
	size_t	value_len;
	char	*value;
	unsigned short name_len;
	char	name[];  /* size padded to machine word size */
};

struct ea_req_compare {
	int	operation = EA_REQ_VERIFY;
	int	op_flags;
	size_t	value_len;
	char	*value;
	unsigned short name_len;
	char	name[];  /* size padded to machine word size */
};

struct ea_req_get_cookie {
	int	operation = EA_REQ_GET_COOKIE;
	size_t	*buffer_size;
	char	*buffer;
};

struct struct ea_req_compare_cookie {
	int	operation = EA_REQ_VERIFY_COOKIE;
	size_t	value_len;
	char	*value;
};

The EA_REQ_LIST operation can pass attribute names as variable length
records. With an integer namespace identifier the previous
"name1\0name2\0name3\0\0" format isn't suitable anymore, so this format
can be used instead:

struct ea_entry {
	int	namespace;
	unsigned short name_len;
	char	name[];  /* size padded to machine word size */
};

Names are still zero terminated strings.

With this approach, multiple EA operations can be implemented without too
much system call overhead. Of course, implementing this is much more
complicated than the previous proposal.

The marshalling/buffer management/etc. would ideally be handled by a
library, instead of dealing with that in each application separately.

The default semantics would be to process the requests in sequence,
aborting at the first request that fails. The system call itself could
return the number of requests processed successfully.

In the flags parameter to the system call, users could request additional
restrictions that might be supported by the implementation or not, like
the following:

  EA_FLAG_ISOLATED
    The requests are processed without other processes seeing any
    intermediate steps.

  EA_FLAG_ATOMIC
    Either all requests or none of the requests is processed.

  EA_FLAG_SYNC
    The EA's are guarenteed to be persistent on disk when the system
    call returns.

An implementation would be free to use EA_FLAG_ISOLATED or EA_FLAG_SYNC
semantics even though the corresponding flags were not set. As
EA_FLAG_ATOMIC is a very strong requirement, most current filesystems
probably wouldn't support it.

The op_flags member of individual operations could include:

  EA_OP_FLAG_CREATE
    The operation only succeeds if the EA doesn't exist already.

  EA_OP_FLAG_EXISTS
    The operation only succeeds if the EA exists already.

About the EA_REQ_VERIFY, EA_REQ_GET_COOKIE and EA_REQ_VERIFY_COOKIE
operations. The problem with simple GET followed by a SET at some later
point in time is that another process might in the meantime also have
manipulated the very same EA. For relative changes to an EA, that's bad.
Arbitrary interleavings of GET/SET operations lead to unpredictable
results. The two approaches to get around that I'm currently aware of are:
Either check that the previous value hasn't changed, or on check that a
magic cookie (some sort of version tag) hasn't changed in the meantime.

For the comparison approach, one correct implementation would be an atomic
sequence of the two operations [EA_REQ_VERIFY, EA_REQ_SET]
(EA_FLAG_ISOLATED), resulting in an atomic test-and-set operation:
EA_REQ_VERIFY would compare the value retrieved in the EA_REQ_GET
operation with the current EA value and abort if the previous value has
changed in the meantime. This oepration is expensive if the EA value gets
big.

The other approach would be to retrieve a magic cookie together with the
original value (this could be a simple integer). Instead of comparing the
values, the previous and current cookies are compared. The cookie
associated with an inode doesn't have to be related to the individual
attribute, it must only be guaranteed that the cookie changes when that EA
changes. Operation sequences: [EA_REQ_GET_COOKIE, EA_REQ_GET] (no flags
required), and at some later point in time: [EA_REQ_VERIFY_COOKIE,
EA_REQ_SET] (EA_FLAG_ISOLATED).

I don't know if any protocols support the value comparison approach, but
don't support the cookie apporach. AFAIK NFSv4 supports neither, but a
verify operation can be followed by a set operation in a single RPC
request, so at least the time window for inconsistencies gets minimized.

For local filesystems, the cookie approach seems pretty easy to implement.
The i_version fiels that is present in each in-memory inode can directly
be used as the cookie. Cookies don't need to be stored on the filesystems.

The cookie could also be used for the EA_REQ_LIST operation to retrieve
very long lists of EA names across multiple system calls reliably.


Regards,
Andreas.

------------------------------------------------------------------------
 Andreas Gruenbacher, a.gruenbacher@computer.org
 Contact information: http://www.bestbits.at/~ag/


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org