[LWN Logo]
[Timeline]
Date:	Sun, 22 Oct 2000 16:23:53 +0200 (CEST)
From:	Andreas Gruenbacher <ag@bestbits.at>
To:	linux-fsdevel@vger.kernel.org
Subject: [PROPOSAL] Extended attributes for Posix security extensions

Hello,

This is a proposal to add extended attributes to the Linux kernel.
Extended attributes are name/value pairs associated with inodes.

A patch implementing extended attribute system calls, the VFS interface,
and code for the ext2 filesystem is available. All material is GPL and/or
LGPL licensed. The core implementation is pretty stable since months.

New to the code base is a mechanism for sharing extended attribute blocks
among inodes on the ext2 filesystem. This is a significant optimization.
The block sharing code might still contain bugs. It can be deactivated at
kernel compile time.

Another patch that implements access control lists on top of extended
attributes is also available. This code is already in production on
several of my i386 based systems since a couple of months.

Here I will try to give an overview of the architecture and implementation
of extended attributes. The access control lists patch is not discussed in
detail. More information is available in manual pages and other documents
on the web site, <http://acl.bestbits.at/>.

I would like to discuss these issues with the final goal of getting the
code into the standard kernel. Comments welcome...


WHAT ARE EXTENDED ATTRIBUTES?

Extended attributes are name/value pairs associated permanently with
inodes, similar to the environment strings associated with a process. An
attribute may be defined or undefined. If it is defined, its value may be
empty or non-empty.


WHAT ARE THEY USED FOR?

They are used by the kernel for storing system objects such as a file's
access control list [1], and by processes for storing various other pieces
of information. (They are not used for storing additional streams of
information, like HFS resource forks). Other examples are the capabilities
[1] associated with an executable, or maybe the mime type or encoding of a
file's contents.


WHY DO WE NEED THAT?

Various recent and not-so-recent OS developments require storing
additional pieces of information with files and directories. A prominent
example are trusted OS extensions, including Access Control Lists,
Capabilities and Mandatory Access Control.

Linux so far does not provide support for permanently storing such
additional meta-data on its filesystems. Extended attributes provide the
concepts and mechanisms to fill that void. This will allow Linux to catch
up with the major UNIX systems in these important security related areas.
Since the concepts are rather general, many other interesting applications
of extended attributes will probably also be "invented".


CURRENT USES

A complete and close to production quality implementation of POSIX-like
access control lists, implemented on top of extended attributes, exists.
This patch is maintained to be in sync with the extended attributes patch.

Andrew Morgan has some code against an earlier version of the extended
attributes patch that implements filesystem capabilities. This patch is
out of date at the moment.


EXTENDED ATTRIBUTE SEMANTICS

(The semantics described here are are an extension of concepts implemented
in comparable UNIX systems; cp. [2] [3] [4])

Extended attribute values are relatively small, so that they can be passed
between the kernel and user space in pre-allocated buffers. Operations on
extended attributes are atomic, i.e., attribute values are retrieved
entirely in one system call. Setting a new value for an extended attribute
overwrites the previous value.

Different extended attributes require different permissions for reading
and writing, and may also restrict the values accepted. There are two
different extended attribute types to support that requirement. The type
of an extended attribute is determined by its name.

Extended user attribute names start with an english alphabet letter
([A-Za-z]). Extended user attributes are subject to the same restrictions
as the contents of a file. The file owner determines who is allowed to
create, read and/or write extended attributes.

Extended system attribute names start with a '$' character. The kernel
determines who is allowed to manipulate which extended system attributes,
and which attribute values are accepted.

The complete list of attribute names defined for an inode may be retrieved
by any process with search access to the inode.

See <http://acl.bestbits.at/doc/ext_attr.5.html> for more information.


SYSTEM CALLS AND VFS INTERFACE

The current implementation features three additional system calls. The
ext_attr_path() and ext_attr_fd() system calls manipulate extended
attributes of an inode by path name and file descriptor, respectively. The
ext_attr_proc() system call is reserved. It will be used to manipulate
extended attributes associated with a process, such as a per-process
access control list.

See <http://acl.bestbits.at/man/ext_attr_path.html> for details on the
system calls.

In the kernel, the extended attribute name determines which handler takes
care of performing the requested operations. All extended user attributes
are handled by the same handler. For each extended system attribute name,
a different handler is registered that takes care of operations on that
extended system attribute. The kernel rejects attempts to manipulate
extended system attributes for which no handler exists.

Each handler checks for the appropriate permissions, and may also restrict
the accepted values as appropriate.

At the VFS layer, the two additional inode operations get_ext_attr and
set_ext_attr are added. They may be implemented by filesystems that
support extended attributes. The handlers are expected to use these
operations. The filesystem treats extended attribute names and values as
opaque entities.


BENEFITS AND LIMITATIONS OF THIS DESIGN

This simple architecture can be implemented efficiently, with reasonable
performance. Much flexibility remains for filesystem implementations,
without requiring the filesystems to duplicate code. Keeping the
filesystems dumb about extended atttribute names, values and permissions
and centralizing the logic in the extended attribute handlers starts to
pay off when multiple filesystems implement extended attributes.

The system calls and VFS interface don't need to maintain state (similar
to a file handle) across extended attribute operations. No opening /
closing operations are necessary. No additional locking operations are
required.

Efficient implementations are necessary for (at least some) extended
attributes, such as access control lists. Under heavy load, several
thousand accesses to access control lists can occur per second. (The
access control list patch over ext2 extended attributes achieves that goal
very well.)

The approach of passing extended attributes between the kernel and user
space in a pre-allocated buffer makes very large extended attribute values
expensive. Arbitrarily big extended attributes are not supported. This is
intentional. Extended attributes are not an implementation of streams
similar to HFS resource forks.

Extended attributes as proposed here fit very well into upcoming standards
(e.g., NFS, the pax utility). Many other UNIX-like operating systems
(e.g., Irix, Tru64, FreeBSD) also support comparable extended attribute
functionality.

Manipulating all extended attributes using the same system calls, and
having a unified name space for extended attributes, has several
advantages. Differentiating between different types of extended attributes
(user/system) by name simplifies filesystem implementations, as
filesystems typically wouldn't care about the type of an extended
attribute.

Using the same system calls avoids the introduction of lots of additional
system calls, each implementing a similar function in a different way.
Appropriate permissions provided, extended attributes can be backed up
using only the extended attribute system calls. Restoring is also possible
via the same mechanism.


EXT2 FILESYSTEM IMPLEMENTATION

My goals for implementing extended attributes on ext2 included
compatibility with existing ext2 filesystems and fast access to extended
attributes.

All extended attributes associated with one inode are stored on a single
disk block. Disk blocks for extended attributes are allocated "raw". These
blocks are not part of the data blocks of any inode. For inodes with
extended attributes, the i_file_acl field point to the extended attributes
block.

This scheme allows retrieving the extended attributes of an inode with
only one disk access. The i_file_acl field is used to point to the
extended attributes lock of an inode. This field was originally reserved
for implementing access control lists. Since access control lists are
implemented on top of extended attributes, the i_file_acl field can be
adopted for extended attributes.

Especially with system objects such as access control lists, and likely
also with capabilities and other extended attributes, it is very common
for two inodes to have some identical extended attributes.  If the whole
set of extended attributes is identical for two inodes, these inodes can
share the same extended attribute block. (Unfortunately with the current
implementation, sharing is not possible if any of the extended attributes
differ. This would require additional space in each inode, which is not
available on ext2.)

Detecting such identical blocks is achieved by computing hash values of
extended attribute blocks and caching them. When a new block is created,
the cache is looked up. If an identical block is found it is reused (only
incrementing a reference count on disk), otherwise a new block is
allocated.

A couple of future extensions are possible with the on-disk format used.
This includes support for bigger extended attribute values, to be stored
externally (in regular inodes?). If needed, attribute descriptors (the
list of attribute names) could also be stored across multiple disk blocks.
Another optimization would be to implement a table of commonly-used
attribute names, rather than storing all the attribute names in each
extended attribute block.

The disk blocks used for extended attributes are accounted for in the
i_blocks field of inodes, so no changes in the quota code are necessary.


ACCESS CONTROL LISTS IMPLEMENTED AS EXTENDED ATTRIBUTES

The access control list implementation complies with Posix 1003.1e draft
standard 17 (the most recent version available.)

The ACL of an inode is stored as an extended system attribute named
"$acl". The Default ACL of an inode is stored as an extended system
attribute named "$defacl". The extended attribute formats for ACLs and
Default ACLs are the same; the representation is architecture independent.

For use in the kernel, the extended attribute format is converted into an
in-memory format that can be processed faster.

Accesses to ACLs are very frequent, especially if working in deeply nested
directories with ACLs along the path. To speed up permission checking and
file creation, ACLs and default ACLs are also cached in the in-memory
inode in their in-memory representation.


USING THE PATCHES

Descriptions on how to apply the patches can be found on the web site,
<http://acl.bestbits.at/>. In addition to the kernel patches, the
e2fsprogs package needs to be patched, so that e2fsck doesn't get confused
about extended attributes. Utilities for manipulating extended attributes
are also available.

For access control lists, draft standard 17 compliant utilities are
available. In addition, patching the fileutils package is recommended.


REFERENCES

[1] Posix 1003.1e + 1003.2c draft standard 17,
    <http://www.guug.de/~winni/posix.1e/download.html>

    Defines security operating system extensions such as access
    control lists, capabilities and mandatory access control.

[2] Irix 6.5 attr(1) manual page,
    <http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=man&fname=/usr/share/catman/u_man/cat1/attr.z>

[3] FreeBSD extatr(9) manual page,
    <http://www.FreeBSD.org/cgi/man.cgi?query=extattr&apropos=0&sektion=0&manpath=FreeBSD+5.0-current&format=html>

[4] Compaq Tru64 proplist(4) manual page,
    <http://www.tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V50_HTML/MAN/MAN4/0200____.HTM>



------------------------------------------------------------------------
 Andreas Gruenbacher, a.gruenbacher@computer.org
 Contact information: http://www.bestbits.at/~ag/

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org