[LWN Logo]

From: Solar Designer <solar@false.com>
Subject: 2.2.12
To: security-audit@ferret.lmh.ox.ac.uk
Date: Tue, 31 Aug 1999 10:03:37 +0400 (MSD)

Hi,

I've finally updated most of my kernel patch for Linux 2.2, and am
making it available for testing:

	ftp://ftp.openwall.com/linux-2.2.12-ow0.diff.gz

If everything goes well, I'll make its official release in a week,
when I update the README, etc.  (I am not going to leave that file
FTP'able after the patch is released, it's here for testing only.)

In addition to the usual features, I've included a few changes (bug
fixes and other) that I think should probably get into the standard
kernel (maybe 2.2.13?).  Those are not within any #ifdef's, and I'm
explaining my reasoning for the changes, below.

Alan, maybe you could look through the patch, and decide which of
those changes/fixes are getting in? :-)

While we're on the topic, the description of my patch in this list's
FAQ looks a bit dangerous to me.  It says that the patch "prevents
/tmp link attacks completely".  While this may be the case, if we're
only talking about "link attacks", people might think it makes all
/tmp-related vulnerabilities unexploitable.  In reality, at least DoS
attacks are often possible (but don't even require a "link", indeed),
and there're also various file permission vulnerabilities related to
temporary files (which have nothing to do with "links").  I'd like to
see some reminder in the FAQ, such as -- "(but not necessarily other
ways to exploit various /tmp-related vulnerabilities)" -- added right
after the word "completely" in the quote, above.  Sorry for spending
that much time telling the obvious, but it appears that those things
are important.

OK, now to the various 2.2.12 problems (some of which are definitely
not limited to 2.2).  First, there're two problems with execve(2) --
thanks to Tymm Twillman for reporting one, and making me look at the
code more closely.  This one was originally believed to be a sign bug
(and 2.2.13pre1 even includes a non-working fix for it), but is in
reality a missing error check on the return from strlen_user().  BTW,
2.0 did this length calculation differently, and isn't vulnerable.

The other problem, is with the ability for a user to cause the kernel
to spend lots of cycles counting arguments and measuring the lengths,
even though they would never fit into the 128 KB anyway.  I was able
to make it waste several seconds per call, and that's without any mm
tricks, yet.  Do that in a loop, and the system effectively halts.

I've fixed the counting of arguments, but not the strlen_user() call,
yet (I mean its performance).  This has already reduced the impact of
the problem significantly, as strlen_user() is architecture-specific
and is pretty efficient (runs at the full memory bandwidth on many
systems).  However, by using the mm tricks and ordering the pages in
a way that will stop the caches from doing their job (which is pretty
easy), one might provide 1 GB of data (or more), without a single NUL
byte in there, and get to the same several seconds per call (assuming
a 300 MB/sec memory bandwidth of a modern system).  Of course, they
don't need to have access to that much physical memory.  However,
this means that a workaround is to set RLIMIT_AS to something like
32 MB, which you should probably be doing anyway.

This problem is also present in 2.0 kernels, so I'll port my partial
fix back to 2.0.38 when I update the 2.0 patch for it (this week).

In /proc, the PID directories and fd symlinks (both are with numeric
names), can also be accessed with any amount of zeroes prepended to
the names.  So, a "cd /proc/000000000$$" would work, and can be used
to obtain an overly long cwd.  No obvious security impact, but seems
like something we'd better fix.  I did, and will do for 2.0, as well.

CLONE_PID is still enabled in 2.2.13pre1. :-(  I think everyone has
already agreed it's a bad idea to leave this unrestricted.  In this
patch, I've decided to also restrict the exit_signal's one might set
with clone(2).  Here's the relevant comment:

 * Disallow unknown clone(2) flags, as well as CLONE_PID. Also, only
 * allow either SIGCHLD or SIGUSR2, or no signal, because of the lack
 * of checking in notify_parent()/send_sig_info() (the parent could
 * already be a SUID program that has done setuid(geteuid()), and thus
 * expects to be protected from signals sent by the original user).
 *
 * Note that the only reason we have to allow SIGUSR2 here, is that it
 * is used by LinuxThreads. This also means that SUID programs which
 * use LinuxThreads remain unprotected. Perhaps an extra signal mask
 * of requested exit signals could be kept for every process, cleared
 * on execve(2), and checked in notify_parent(), eliminating the need
 * to check the signal here.

Suggestions of a better fix are welcome.  Will port the fix to 2.0.38,
which seems to also need it.

And the last non-configurable change the patch does, -- chown is made
to behave the way it did in 2.0.  Here's the comment:

         * If the user or group of a non-directory has been changed by a
         * non-root user, remove the setuid bit.
         * 19981026     David C Niemi <niemi@tux.org>
         *
         * Changed this to apply to all users, including root, to avoid
         * some races. This is the behavior we had in 2.0. The check for
         * non-root was definitely wrong for 2.2 anyway, as it should
         * have been using CAP_FSETID rather than fsuid -- 19990830 SD.

In my opinion, we should switch to either the 2.0 behavior (this is
what I would prefer), or checking CAP_FSETID (to match the definition
of this capability, and the comment in capability.h).

Signed,
Solar Designer