Date: Mon, 13 Mar 2000 09:33:10 -0800 (PST) From: Linus Torvalds <torvalds@transmeta.com> To: Ingo Molnar <mingo@chiara.csoma.elte.hu> Subject: Re: new IRQ scalability changes in 2.3.48 On Mon, 13 Mar 2000, Ingo Molnar wrote: > > Having said this, i now do agree that doing a preemptible kernel (which > the Linux SMP kernel could become with a small amount of work) is a > superior solution to this, wrt. latencies. Note that doing a pre-emptive kernel based on the SMP work is truly trivial: the only changes needed would be - UP version of all "spin_lock()/spin_unlock()" defines become incl/decl global_spinlock_count - hardirq_enter/hardirq_exit probably needs to do the same as the above. - UP version of "lock_kernel()/unlock_kernel()" defines become incl/decl current->lock_count - change the "are we in the kernel" test in the low-level interrupt code from CS == __KERNEL_CS to (global_spinlock_count | current->lock_count) - add a if (global_spinlock_count) BUG() to the top of the scheduler and a if (CS != __KERNEL_CS && global_spinlock_count) BUG(); to the return-to-user-mode path, just because there otherwise _will_ be bugs. and you're now done. Tadaa! You have a pre-emptive UP kernel. Add a few months of debugging (because something _will_ crop up, or my name isn't Billy-Bob). NOTE NOTE NOTE! You must NOT change the SMP case at all, including the "are we in the kernel" test. Not only do we not have a global spinlock_count (and we don't want one - it would be cache-line death), but even if we used the above heuristic it would be seriously wrong on SMP, because it would mean that anything that caches the value of "current CPU" would need to lock. Which is just too expensive to even think about, because it happens all over the place. On UP, that just isn't a problem ;) There probably are numerous nasty small details that would crop up, but I'd give it a 15% chance of just working on the first try. Oh, and it's not going to be really really efficient. It's going to increment and decrement global_spinlock_count a lot more than strictly necessary, but any "clever" approach is just going to be too painful to think about, and would make the UP locking too different from the SMP case. You can do micro-optimizations like - spin_[un]lock_[irq|saveirq|restoreirq]() do not need to touch the spinlock_count, because anybody who has interrupts disabled won't get re-scheduled anyway. but doing those kinds of clever things will mean that you'd better make sure that nobody does the equivalent of spin_lock_irq(..); /* count optimized away */ ... spin_unlock(..); /* count NOT optimized away .. */ .. __sti() which probably does happen right now in the scheduler and stuff.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/