Date: Thu, 25 May 2000 19:57:52 +1000 From: Andrew Morton <andrewm@uow.edu.au> To: lkml <linux-kernel@vger.rutgers.edu> Subject: [patches] kernel timer races In 2.2 and 2.3 we have a bit of a problem with kernel timers: A timer handler may still be running after the timer has been deleted. This is why del_timer_sync() was introduced in 2.3. But del_timer_sync() has problems... The kernel has 18 references to del_timer_sync and 668 references to del_timer. I expect that a great number of the del_timer() calls are racy, resulting in occasional failures such as: mainline() handler() ========== ========= enter xxx_timer() del_timer() kfree(some_resource) access(some_resource) The handler is accessing a resource after it has been released. mainline() handler() ========== ========= enter xxx_timer() del_timer() disable(some_device) access(some_device) The handler is touching some device after it has been shut down. mainline() handler() ========== ========= enter xxx_timer() del_timer() foo = 1; foo = 2; What is the value of 'foo'? mainline() handler() ========== ========= enter xxx_timer() del_timer() run... MOD_DEC_USE_COUNT run... sys_delete_module() run... run... The handler's text has been unloaded. mainline() handler() ========== ========= enter xxx_timer() del_timer() kfree(timer) mod_timer(timer) The latter is the worst, and unfortunately it is one of the most likely scenarios. The timer is still ticking over and will continue to do so, but its controlling timer_list is now in kfree'ed memory. The kernel is a dead man walking. Many of the net drivers can do this. I believe this needs fixing. Most of the timer handling in the kernel assumes that once del_timer() returns, the timer is deleted and the handler has terminated. I wonder how many developers even know that this is not true? The fix is to convert del_timer() to have synchronous semantics. That is: Upon return from del_timer(), the timer is not registered and the handler is not running. This implies that del_timer() must spin until the handler completes its run. This is in fact quite hard to do, because of the current very permissive timer API: - The handler may call del_timer() - The handler may call add_timer()/mod_timer(). - The handler may kfree() the timer_list storage. The last one is the killer because it means that run_timer_list() cannot hold state within the timer_list structure. Note that run_timer_list() is at present careful to not touch the timer_list after the handler returns. Alexey tells us that handlers may kfree the timer_list. We would be very interested in finding out which code does this. The current del_timer_sync() attempts to give us the desired synchronous semantics. But it has a few problems: - The handler must call timer_exit() to tell del_timer_sync() that it (the handler) has finished. Very, very few handlers currently call timer_exit(). - Once timer_exit() has been called, the mainline code can do a sys_delete_module() and unload the timer handler's text. Unlikely, but possible. - The current del_timer_sync() spins on timer->running in timer_synchronize(). After the handler returns, the timer_list could have been kfree'ed, but del_timer_sync reads and writes its members. I have put together several kernel modules: http://www.uow.edu.au/~andrewm/linux/timertest.tar.gz The 'showit.o' module demonstrates the existence of the problem. It consists of a kernel thread and a periodic timer. The kernel thread spins until the handler starts, then deletes the timer and then checks to see if the handler is still running. It shows that under 2.2 and 2.3 it is still running. The 'timertest' module is a bunch of test cases for the proposed patch. The 'kfree' module is a test case for some timer-used-after-kfree detection code (uses slab poisoning, not included in these patches). The 'deadlock' module forces a del_timer() deadlock (see below) to test the new deadlock detection/avoidance code. This patch uses timerlist_lock to serialise del_timer() and the handlers. I did considered serialising them with global_bh_lock. It's much of a muchness, and I didn't use global_bh_lock for several reasons: - I expect global_bh_lock will disappear one day, when timer handlers are no longer globally serialised. - 2.2 and 2.3 are quite different in their handling of BH locking. But they are quite similar in their top-level handling of timers. So patching run_timer_list() and del_timer() is more portable to 2.2. - No point in locking against _all_ BH handlers when we're only concerned with timer handlers. - Cleaner. The patch against 2.3.99-pre10-3 is at http://www.uow.edu.au/~andrewm/linux/timer-race-6.patch It does the following: - Maintains an pointer to the currently-running handler's timer_list. - In run_timer_list(), note the currently running timer in that pointer. - In del_timer(), check (under timerlist_lock) to see if this timer is running. If so, spin until it stops. - In internal_add_timer(), check to see if the newly added timer's handler is currently running. If it is, record that fact for del_timer(). - In del_timer(), after the handler has stopped running: if the timer was re-added (by either the handler or by another CPU), delete the darn thing again. - del_timer_sync() disappears. It's macroed onto del_timer(). - timer_exit() becomes a no-op. There's also a bit of temp diagnostic/debuggy stuff in this patch. - Detect the occurence of CPU0 doing a del_timer() during CPU1's execution of the handler. If this does occur, direct people to http://www.uow.edu.au/~andrewm/linux/del_timer.html Reminder: this scenario is the whole reason for this patch (and this long email). We need to gather these statistics so we can decide whether or not to patch 2.2. If I don't get any emails this whole thing was a waste of time. (But I've already seen it happen three times, and I haven't even tried...) - I was doing some work using slab poisoning to detect invalid timer_list usage. To support this I added some initialisation macros in timer.h: timer_list t = INIT_TIMER_FND(fn, d); For initialising a timer, given the handler and the 'data' initialiser. timer_list t = INIT_TIMER_FN(fn); Initialise the timer with the handler only. timer_list t = INIT_TIMER; Initialise to all zeroes. In list.h: #define LIST_HEAD_INIT_NULL { 0, 0 } For cleaner list_head initialisers. These changes allow us to centralise all timer_list initalisation within init_timer() and INIT_TIMER. - There are many files in the kernel tree which use statically initialised timers but fail to initialise them with init_timer(). I have a separate patch which changes all these so that stuff like: static timer_list t = { NULL, NULL, 0, 0, some_func }; becomes static timer_list t = INIT_TIMER_FN(some_func); This change was designed to allow the slab poisoning to be effective, but it is a worthwhile tidy-up in any case. This patch against 56 files is at http://www.uow.edu.au/~andrewm/linux/timer-tidy-6.patch - Deadlock detection: This del_timer synchronisation introduces a potential deadlock if your mainline and your timer handler both acquire the same lock: mainline() handler() ========== ========= spin_lock_irqsave(some_lock) spin_lock_irqsave(some_lock) del_timer() spin_unlock_irqrestore(some_lock) spin_unlock_irqrestore(some_lock) With the current async del_timer() semantics this would not deadlock. But with the proposed synchronous semantics, del_timer() will spin for ever. So in this patch the spin is limited to 5e8 loops, after which it complains, does a stack dump (x86 only), directs the user to http://www.uow.edu.au/~andrewm/linux/deadlock.html and proceeds. There is still a little race in this code, identified in del_timer(). This can be fixed by some wild cross-communication between del_timer() and run_timer_list(), but that fix will only obscure things more. This patch will get confused (and simply fall back to the current behaviour) if the handler re-adds the timer, then deletes it and then kfree's the timer's memory. Dumb thing to do anyway. Poisoning will pick this up. I don't expect that anything currently depends upon del_timer()'s asynchronous behaviour (how could it?). If this is required it would be simple to add a new function, del_timer_async(). ---- This fix would (I think) be cleaner and more efficient if we could store state within the timer_list structure, and examine that state within del_timer() and run_timer_list(): del_timer() { detach_timer() while (timer->running) ; if (timer_pending(timer)) detach_timer(); } run_timer_list() { detach_timer(); timer->running = 1; timer->function(); timer->running = 0; } toss in the appropriate locking and this is fine. But if the handler can kfree the timer_list, neither of these functions are allowed to touch the timer_list structure after the handler returns (or, in the case of del_timer(), while it is running). So to do the above, we must redefine the timer API. Namely, the handler _must_ return with the timer_list structure still validly accessible by run_timer_list() and del_timer(). That is, the handler is not allowed to kfree() or otherwise maliciously diddle with the timer_list. This is why we would be very interested in finding out which code does this... [ But are there other ways in which the handler could trash the timer_list apart from kfree()? ] -- -akpm- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/