d/timers

Date:	Thu, 25 May 2000 19:57:52 +1000
From:	Andrew Morton <andrewm@uow.edu.au>
To:	lkml <linux-kernel@vger.rutgers.edu>
Subject: [patches] kernel timer races

In 2.2 and 2.3 we have a bit of a problem with kernel timers:

     A timer handler may still be running after the timer has been
     deleted.

This is why del_timer_sync() was introduced in 2.3.  But
del_timer_sync() has problems...

The kernel has 18 references to del_timer_sync and 668 references to
del_timer.  I expect that a great number of the del_timer() calls are
racy, resulting in occasional failures such as:



     mainline()                       handler()
     ==========                       =========
                                      enter xxx_timer()
     del_timer()
     kfree(some_resource)
                                      access(some_resource)

The handler is accessing a resource after it has been released.




     mainline()                       handler()
     ==========                       =========
                                      enter xxx_timer()
     del_timer()
     disable(some_device)
                                      access(some_device)

The handler is touching some device after it has been shut down.




     mainline()                       handler()
     ==========                       =========
                                      enter xxx_timer()
     del_timer()
     foo = 1;                         foo = 2;

What is the value of 'foo'?




     mainline()                       handler()
     ==========                       =========
                                      enter xxx_timer()
     del_timer()                      run...
     MOD_DEC_USE_COUNT                run...
     sys_delete_module()              run...
                                      run...

The handler's text has been unloaded.



     mainline()                       handler()
     ==========                       =========
                                      enter xxx_timer()
     del_timer()
     kfree(timer)
                                      mod_timer(timer)


The latter is the worst, and unfortunately it is one of the most likely
scenarios.  The timer is still ticking over and will continue to do so,
but its controlling timer_list is now in kfree'ed memory.  The kernel
is a dead man walking.  Many of the net drivers can do this.



I believe this needs fixing.  Most of the timer handling in the kernel
assumes that once del_timer() returns, the timer is deleted and the
handler has terminated.  I wonder how many developers even know that
this is not true?


The fix is to convert del_timer() to have synchronous semantics.  That
is:

     Upon return from del_timer(), the timer is not registered and
     the handler is not running.

This implies that del_timer() must spin until the handler completes its
run.

This is in fact quite hard to do, because of the current very
permissive timer API:

     - The handler may call del_timer()
     - The handler may call add_timer()/mod_timer().
     - The handler may kfree() the timer_list storage.

The last one is the killer because it means that run_timer_list()
cannot hold state within the timer_list structure.  Note that
run_timer_list() is at present careful to not touch the timer_list
after the handler returns.

Alexey tells us that handlers may kfree the timer_list.  We would be
very interested in finding out which code does this.



The current del_timer_sync() attempts to give us the desired
synchronous semantics.  But it has a few problems:

     - The handler must call timer_exit() to tell
       del_timer_sync() that it (the handler) has finished.  Very, very
       few handlers currently call timer_exit().

     - Once timer_exit() has been called, the mainline code can
       do a sys_delete_module() and unload the timer handler's text. 
       Unlikely, but possible.

     - The current del_timer_sync() spins on timer->running in
       timer_synchronize().  After the handler returns, the timer_list
       could have been kfree'ed, but del_timer_sync reads and writes
       its members.


I have put together several kernel modules:

     http://www.uow.edu.au/~andrewm/linux/timertest.tar.gz

The 'showit.o' module demonstrates the existence of the problem.  It
consists of a kernel thread and a periodic timer.  The kernel thread
spins until the handler starts, then deletes the timer and then checks
to see if the handler is still running.  It shows that under 2.2 and
2.3 it is still running.

The 'timertest' module is a bunch of test cases for the proposed patch.

The 'kfree' module is a test case for some timer-used-after-kfree
detection code (uses slab poisoning, not included in these patches).

The 'deadlock' module forces a del_timer() deadlock (see below) to test
the new deadlock detection/avoidance code.


This patch uses timerlist_lock to serialise del_timer() and the
handlers.  I did considered serialising them with global_bh_lock.  It's
much of a muchness, and I didn't use global_bh_lock for several
reasons:

     - I expect global_bh_lock will disappear one day, when
       timer handlers are no longer globally serialised.

     - 2.2 and 2.3 are quite different in their handling of BH
       locking.  But they are quite similar in their top-level handling
       of timers.  So patching run_timer_list() and del_timer() is more
       portable to 2.2.

     - No point in locking against _all_ BH handlers when we're
       only concerned with timer handlers.

     - Cleaner.

The patch against 2.3.99-pre10-3 is at

     http://www.uow.edu.au/~andrewm/linux/timer-race-6.patch

It does the following:

     - Maintains an pointer to the currently-running handler's
       timer_list.

     - In run_timer_list(), note the currently running timer in
       that pointer.

     - In del_timer(), check (under timerlist_lock) to see if
       this timer is running.  If so, spin until it stops.

     - In internal_add_timer(), check to see if the newly added
       timer's handler is currently running.  If it is, record that
       fact for del_timer().

     - In del_timer(), after the handler has stopped running: if
       the timer was re-added (by either the handler or by another
       CPU), delete the darn thing again.

     - del_timer_sync() disappears.  It's macroed onto del_timer().

     - timer_exit() becomes a no-op.



There's also a bit of temp diagnostic/debuggy stuff in this patch.

     - Detect the occurence of CPU0 doing a del_timer() during
       CPU1's execution of the handler.  If this does occur, direct
       people to

          http://www.uow.edu.au/~andrewm/linux/del_timer.html

       Reminder: this scenario is the whole reason for this
       patch (and this long email).  We need to gather these statistics
       so we can decide whether or not to patch 2.2.  If I don't get
       any emails this whole thing was a waste of time.  (But I've
       already seen it happen three times, and I haven't even tried...)

     - I was doing some work using slab poisoning to detect
       invalid timer_list usage.  To support this I added
       some initialisation macros in timer.h:

       timer_list t = INIT_TIMER_FND(fn, d);

           For initialising a timer, given the handler and
           the 'data' initialiser.

       timer_list t = INIT_TIMER_FN(fn);

           Initialise the timer with the handler only.

       timer_list t = INIT_TIMER;

           Initialise to all zeroes.

       In list.h:

          #define LIST_HEAD_INIT_NULL { 0, 0 }

       For cleaner list_head initialisers.

       These changes allow us to centralise all timer_list
       initalisation within init_timer() and INIT_TIMER.

     - There are many files in the kernel tree which use
       statically initialised timers but fail to initialise them with
       init_timer().

       I have a separate patch which changes all these so that
       stuff like:

         static timer_list t = { NULL, NULL, 0, 0, some_func };

       becomes

         static timer_list t = INIT_TIMER_FN(some_func);

       This change was designed to allow the slab poisoning to
       be effective, but it is a worthwhile tidy-up in any case.

       This patch against 56 files is at

         http://www.uow.edu.au/~andrewm/linux/timer-tidy-6.patch

     - Deadlock detection:


This del_timer synchronisation introduces a potential deadlock if your
mainline and your timer handler both acquire the same lock:

     mainline()                       handler()
     ==========                       =========

     spin_lock_irqsave(some_lock)
                                      spin_lock_irqsave(some_lock)
     del_timer()
                                      spin_unlock_irqrestore(some_lock)
     spin_unlock_irqrestore(some_lock)

With the current async del_timer() semantics this would not deadlock. 
But with the proposed synchronous semantics, del_timer() will spin for
ever.  So in this patch the spin is limited to 5e8 loops, after which
it complains, does a stack dump (x86 only), directs the user to
http://www.uow.edu.au/~andrewm/linux/deadlock.html and proceeds.

There is still a little race in this code, identified in del_timer(). 
This can be fixed by some wild cross-communication between del_timer()
and run_timer_list(), but that fix will only obscure things more.

This patch will get confused (and simply fall back to the current
behaviour) if the handler re-adds the timer, then deletes it and then
kfree's the timer's memory.  Dumb thing to do anyway.  Poisoning will
pick this up.

I don't expect that anything currently depends upon del_timer()'s
asynchronous behaviour (how could it?).  If this is required it would
be simple to add a new function, del_timer_async().

----

This fix would (I think) be cleaner and more efficient if we could
store state within the timer_list structure, and examine that state
within del_timer() and run_timer_list():

del_timer()
{
	detach_timer()
	while (timer->running)
		;
	if (timer_pending(timer))
		detach_timer();
}

run_timer_list()
{
	detach_timer();
	timer->running = 1;
	timer->function();
	timer->running = 0;
}

toss in the appropriate locking and this is fine.  But if the handler
can kfree the timer_list, neither of these functions are allowed to
touch the timer_list structure after the handler returns (or, in the
case of del_timer(), while it is running).

So to do the above, we must redefine the timer API.  Namely, the
handler _must_ return with the timer_list structure still validly
accessible by run_timer_list() and del_timer().  That is, the handler
is not allowed to kfree() or otherwise maliciously diddle with the
timer_list.  This is why we would be very interested in finding out
which code does this...

  [ But are there other ways in which the handler could trash the
    timer_list apart from kfree()? ]


-- 
-akpm-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/