Date: Tue, 24 Oct 2000 12:21:04 +0100 From: "Stephen C. Tweedie" <sct@redhat.com> To: Andreas Gruenbacher <ag@bestbits.at> Subject: Re: [PROPOSAL] Extended attributes for Posix security extensions Hi, On Sun, Oct 22, 2000 at 04:23:53PM +0200, Andreas Gruenbacher wrote: > > This is a proposal to add extended attributes to the Linux kernel. > Extended attributes are name/value pairs associated with inodes. What timing --- we had a long discussion (actually, several!) about this very topic at the Miami storage workshop last week. One of the main goals we had in getting people together to talk about extended attributes in general, and ACLs in particular, was to deal with the API issues cleanly. In particular, we really want the API to be at the same time: * General enough to deal with all of the existing, mutually-incompatible semantics for ACLs and attributes; and * Specific enough to define the requested semantics unambiguously for any one given implementation of the underlying attributes. These points are really important. We have people wanting to add ACL support to ext2 in a manner which Samba can use --- can we do POSIX ACLs with NT SID identifiers rather than with unix user ids? If we mount an NTFS filesystem, it will have native ACLs on disk. How does the API speficy that we want NT semantics, not POSIX semantics, for a given request? There is also the naming issue. There are multiple independent namespaces. For extended attributes, there may be totally separate namespaces for user attributes and for system ones, or there may be a common namespace with per-attribute system status. Again, these different sets of semantics _already exist_ on filesystems which Linux can mount (eg. NTFS, JFS and XFS), so the API has to deal with them. There is already a kernel API which has this flexibility: the BSD socket API handles these issues through the concepts of protocol families and address families. Those same two concepts are perfectly matched to the extended attributes problem. The proposal defines two "families" of attribute entities: attribute families and name families. An attribute family might be ATR_USER or ATR_SYSTEM to specify that we are dealing with arbitrary user or system named extended attributes, or ATR_POSIXACL to specify POSIX-semantics ACLs. Obviously, this can be extended to other ACL semantics without revving the API --- a new attribute family would be all that is needed. The "name family" is the other part of the equation. Attributes in the ATR_USER or ATR_SYSTEM families might be named with counted strings, so they would have names in the ANAME_STRING name family. POSIX ACLs, however, have a different namespace: ANAME_UID or ANAME_GID. The API cleanly deals with the difference between user and group ACLs. It also makes it easy to add support later on for more complex operations: if we want to add NT SID support to ext2 ACLs so that Samba and local accesses get the same access control, we can pass ANAME_NTSID names to the ATR_POSIXACL attribute family without changing the API. Obviously the combinations of name types supported for any given attribute family will depend on the underlying implementation, but that's the whole point --- the API is expressive enough to define unambiguously what the application is trying to do, so that if the underlying filesystem doesn't support (say) POSIX ACLs, we get an error back telling us so rather than attempting to do an incomplete map of the POSIX request onto whatever the underlying filesystem happens to support. Before we look at the syscall API in detail, there's one other point to note. It is common to want to read or set one individual attribute in isolation (even if it is an atomic set-and-get which is being performed on that attribute). Sometimes, however, you want to access the entire set of related attributes as an ordered list. ACLs are the obvious case: if you have underlying semantics which allow you to mix both PASS and DENY ACLs on a file, then the order of the ACLs obviously matters. In such cases, you may sometimes want just to query or set the ACL for a specific user, but often you will want to do something more complex such as change the order of ACLs on the list or replace the entire list as a single entity --- and you want to do so atomically. So, the simple "SET" and "GET" operations on named attributes (which correspond to writing and reading the ACLs for specific named users in the ATR_POSIXACL family) need to be augmented with SET variants which append or prepend to the ACL list, or which atomically replace the old ACL list in its entirety. Our proposed kernel API looks something like this: sys_setattr (char *filename, int attrib_family, int op, struct attrib *old_attribs, int *old_lenp, struct attrib *new_attribs, int new_len); sys_fsetattr(int fd, int attrib_family, int op, struct attrib *old_attribs, int *old_lenp, struct attrib *new_attribs, int new_len); where <op> can be ATR_SET overwrite existing attribute ATR_GET read existing attribute ATR_GETALL read entire ordered attribute list (ignores new val) ATR_PREPEND add new attribute to start of ordered list ATR_APPEND add new attribute to end of ordered list ATR_REPLACE replace entire ordered attribute list and where <attribs> is a buffer of length <len> bytes of variable length struct attrib records: struct attrib { int rec_len; /* Length of the whole record: should be padded to long alignment */ int name_family; /* Which namespace is the name in? */ int name_len; int val_len; char name[variable]; /* byte-aligned */ char val[variable]; /* byte-aligned */ }; ATR_SET will overwrite an existing attribute, or if the attribute does not already exist, will append the new attribute (ie. it does not override existing ACL controls, in keeping with the Principle of Least Surprise). If multiple instances of the name already exist, then the first one is replaced and subsequent ones deleted. If supplied with an "old" buffer, all old attributes of that name will be returned. For the PREPEND/APPEND/REPLACE operations, the entire old attribute set is returned. For GET, the <new> specification is read and all attributes which match any items in <new> are returned, in the order in which they are specified in <new>. The actual value in <new> is ignored; only the name is used. For GETALL, <new> is ignored entirely. *old_lenp should contain the size of the old attributes buffer on entry. It will contain the number of valid bytes in the old buffer on exit. If the buffer is not sufficiently large to contain all of the attributes, E2BIG is returned. This is just a first stab at documenting what feels like an appropriate API. It should be extensible enough for the future, but is pretty easy to code to already --- existing filesystems don't have to deal with any complexity they don't want to. Additionally, the use of well-defined namespaces for attributes means that in the future we can implement things like common code for generic attribute caching, or process authentication groups for non-Unix-ID authentication tokens, without having to duplicate all of that work in each individual filesystem. The extended attribute patch currently on the acl-devel group simply doesn't give us the ability to do extended attributes on any filesystem other than ext2, because it has such specific semantics. I'd rather avoid that, and I'd rather do so without adding a profusion of different ACL and attribute syscalls in the process. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org