Date: Sun, 22 Oct 2000 16:23:53 +0200 (CEST) From: Andreas Gruenbacher <ag@bestbits.at> To: linux-fsdevel@vger.kernel.org Subject: [PROPOSAL] Extended attributes for Posix security extensions Hello, This is a proposal to add extended attributes to the Linux kernel. Extended attributes are name/value pairs associated with inodes. A patch implementing extended attribute system calls, the VFS interface, and code for the ext2 filesystem is available. All material is GPL and/or LGPL licensed. The core implementation is pretty stable since months. New to the code base is a mechanism for sharing extended attribute blocks among inodes on the ext2 filesystem. This is a significant optimization. The block sharing code might still contain bugs. It can be deactivated at kernel compile time. Another patch that implements access control lists on top of extended attributes is also available. This code is already in production on several of my i386 based systems since a couple of months. Here I will try to give an overview of the architecture and implementation of extended attributes. The access control lists patch is not discussed in detail. More information is available in manual pages and other documents on the web site, <http://acl.bestbits.at/>. I would like to discuss these issues with the final goal of getting the code into the standard kernel. Comments welcome... WHAT ARE EXTENDED ATTRIBUTES? Extended attributes are name/value pairs associated permanently with inodes, similar to the environment strings associated with a process. An attribute may be defined or undefined. If it is defined, its value may be empty or non-empty. WHAT ARE THEY USED FOR? They are used by the kernel for storing system objects such as a file's access control list [1], and by processes for storing various other pieces of information. (They are not used for storing additional streams of information, like HFS resource forks). Other examples are the capabilities [1] associated with an executable, or maybe the mime type or encoding of a file's contents. WHY DO WE NEED THAT? Various recent and not-so-recent OS developments require storing additional pieces of information with files and directories. A prominent example are trusted OS extensions, including Access Control Lists, Capabilities and Mandatory Access Control. Linux so far does not provide support for permanently storing such additional meta-data on its filesystems. Extended attributes provide the concepts and mechanisms to fill that void. This will allow Linux to catch up with the major UNIX systems in these important security related areas. Since the concepts are rather general, many other interesting applications of extended attributes will probably also be "invented". CURRENT USES A complete and close to production quality implementation of POSIX-like access control lists, implemented on top of extended attributes, exists. This patch is maintained to be in sync with the extended attributes patch. Andrew Morgan has some code against an earlier version of the extended attributes patch that implements filesystem capabilities. This patch is out of date at the moment. EXTENDED ATTRIBUTE SEMANTICS (The semantics described here are are an extension of concepts implemented in comparable UNIX systems; cp. [2] [3] [4]) Extended attribute values are relatively small, so that they can be passed between the kernel and user space in pre-allocated buffers. Operations on extended attributes are atomic, i.e., attribute values are retrieved entirely in one system call. Setting a new value for an extended attribute overwrites the previous value. Different extended attributes require different permissions for reading and writing, and may also restrict the values accepted. There are two different extended attribute types to support that requirement. The type of an extended attribute is determined by its name. Extended user attribute names start with an english alphabet letter ([A-Za-z]). Extended user attributes are subject to the same restrictions as the contents of a file. The file owner determines who is allowed to create, read and/or write extended attributes. Extended system attribute names start with a '$' character. The kernel determines who is allowed to manipulate which extended system attributes, and which attribute values are accepted. The complete list of attribute names defined for an inode may be retrieved by any process with search access to the inode. See <http://acl.bestbits.at/doc/ext_attr.5.html> for more information. SYSTEM CALLS AND VFS INTERFACE The current implementation features three additional system calls. The ext_attr_path() and ext_attr_fd() system calls manipulate extended attributes of an inode by path name and file descriptor, respectively. The ext_attr_proc() system call is reserved. It will be used to manipulate extended attributes associated with a process, such as a per-process access control list. See <http://acl.bestbits.at/man/ext_attr_path.html> for details on the system calls. In the kernel, the extended attribute name determines which handler takes care of performing the requested operations. All extended user attributes are handled by the same handler. For each extended system attribute name, a different handler is registered that takes care of operations on that extended system attribute. The kernel rejects attempts to manipulate extended system attributes for which no handler exists. Each handler checks for the appropriate permissions, and may also restrict the accepted values as appropriate. At the VFS layer, the two additional inode operations get_ext_attr and set_ext_attr are added. They may be implemented by filesystems that support extended attributes. The handlers are expected to use these operations. The filesystem treats extended attribute names and values as opaque entities. BENEFITS AND LIMITATIONS OF THIS DESIGN This simple architecture can be implemented efficiently, with reasonable performance. Much flexibility remains for filesystem implementations, without requiring the filesystems to duplicate code. Keeping the filesystems dumb about extended atttribute names, values and permissions and centralizing the logic in the extended attribute handlers starts to pay off when multiple filesystems implement extended attributes. The system calls and VFS interface don't need to maintain state (similar to a file handle) across extended attribute operations. No opening / closing operations are necessary. No additional locking operations are required. Efficient implementations are necessary for (at least some) extended attributes, such as access control lists. Under heavy load, several thousand accesses to access control lists can occur per second. (The access control list patch over ext2 extended attributes achieves that goal very well.) The approach of passing extended attributes between the kernel and user space in a pre-allocated buffer makes very large extended attribute values expensive. Arbitrarily big extended attributes are not supported. This is intentional. Extended attributes are not an implementation of streams similar to HFS resource forks. Extended attributes as proposed here fit very well into upcoming standards (e.g., NFS, the pax utility). Many other UNIX-like operating systems (e.g., Irix, Tru64, FreeBSD) also support comparable extended attribute functionality. Manipulating all extended attributes using the same system calls, and having a unified name space for extended attributes, has several advantages. Differentiating between different types of extended attributes (user/system) by name simplifies filesystem implementations, as filesystems typically wouldn't care about the type of an extended attribute. Using the same system calls avoids the introduction of lots of additional system calls, each implementing a similar function in a different way. Appropriate permissions provided, extended attributes can be backed up using only the extended attribute system calls. Restoring is also possible via the same mechanism. EXT2 FILESYSTEM IMPLEMENTATION My goals for implementing extended attributes on ext2 included compatibility with existing ext2 filesystems and fast access to extended attributes. All extended attributes associated with one inode are stored on a single disk block. Disk blocks for extended attributes are allocated "raw". These blocks are not part of the data blocks of any inode. For inodes with extended attributes, the i_file_acl field point to the extended attributes block. This scheme allows retrieving the extended attributes of an inode with only one disk access. The i_file_acl field is used to point to the extended attributes lock of an inode. This field was originally reserved for implementing access control lists. Since access control lists are implemented on top of extended attributes, the i_file_acl field can be adopted for extended attributes. Especially with system objects such as access control lists, and likely also with capabilities and other extended attributes, it is very common for two inodes to have some identical extended attributes. If the whole set of extended attributes is identical for two inodes, these inodes can share the same extended attribute block. (Unfortunately with the current implementation, sharing is not possible if any of the extended attributes differ. This would require additional space in each inode, which is not available on ext2.) Detecting such identical blocks is achieved by computing hash values of extended attribute blocks and caching them. When a new block is created, the cache is looked up. If an identical block is found it is reused (only incrementing a reference count on disk), otherwise a new block is allocated. A couple of future extensions are possible with the on-disk format used. This includes support for bigger extended attribute values, to be stored externally (in regular inodes?). If needed, attribute descriptors (the list of attribute names) could also be stored across multiple disk blocks. Another optimization would be to implement a table of commonly-used attribute names, rather than storing all the attribute names in each extended attribute block. The disk blocks used for extended attributes are accounted for in the i_blocks field of inodes, so no changes in the quota code are necessary. ACCESS CONTROL LISTS IMPLEMENTED AS EXTENDED ATTRIBUTES The access control list implementation complies with Posix 1003.1e draft standard 17 (the most recent version available.) The ACL of an inode is stored as an extended system attribute named "$acl". The Default ACL of an inode is stored as an extended system attribute named "$defacl". The extended attribute formats for ACLs and Default ACLs are the same; the representation is architecture independent. For use in the kernel, the extended attribute format is converted into an in-memory format that can be processed faster. Accesses to ACLs are very frequent, especially if working in deeply nested directories with ACLs along the path. To speed up permission checking and file creation, ACLs and default ACLs are also cached in the in-memory inode in their in-memory representation. USING THE PATCHES Descriptions on how to apply the patches can be found on the web site, <http://acl.bestbits.at/>. In addition to the kernel patches, the e2fsprogs package needs to be patched, so that e2fsck doesn't get confused about extended attributes. Utilities for manipulating extended attributes are also available. For access control lists, draft standard 17 compliant utilities are available. In addition, patching the fileutils package is recommended. REFERENCES [1] Posix 1003.1e + 1003.2c draft standard 17, <http://www.guug.de/~winni/posix.1e/download.html> Defines security operating system extensions such as access control lists, capabilities and mandatory access control. [2] Irix 6.5 attr(1) manual page, <http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=man&fname=/usr/share/catman/u_man/cat1/attr.z> [3] FreeBSD extatr(9) manual page, <http://www.FreeBSD.org/cgi/man.cgi?query=extattr&apropos=0&sektion=0&manpath=FreeBSD+5.0-current&format=html> [4] Compaq Tru64 proplist(4) manual page, <http://www.tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V50_HTML/MAN/MAN4/0200____.HTM> ------------------------------------------------------------------------ Andreas Gruenbacher, a.gruenbacher@computer.org Contact information: http://www.bestbits.at/~ag/ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org