a/lt-preempt

Date:	Mon, 13 Mar 2000 09:33:10 -0800 (PST)
From:	Linus Torvalds <torvalds@transmeta.com>
To:	Ingo Molnar <mingo@chiara.csoma.elte.hu>
Subject: Re: new IRQ scalability changes in 2.3.48



On Mon, 13 Mar 2000, Ingo Molnar wrote:
> 
> Having said this, i now do agree that doing a preemptible kernel (which
> the Linux SMP kernel could become with a small amount of work) is a
> superior solution to this, wrt. latencies.

Note that doing a pre-emptive kernel based on the SMP work is truly
trivial: the only changes needed would be

 - UP version of all "spin_lock()/spin_unlock()" defines become

	incl/decl global_spinlock_count

 - hardirq_enter/hardirq_exit probably needs to do the same as the above.

 - UP version of "lock_kernel()/unlock_kernel()" defines become

	incl/decl current->lock_count

 - change the "are we in the kernel" test in the low-level interrupt code
   from

	CS == __KERNEL_CS

   to

	(global_spinlock_count | current->lock_count)

 - add a

	if (global_spinlock_count) BUG()

   to the top of the scheduler and a

	if (CS != __KERNEL_CS && global_spinlock_count) BUG();

   to the return-to-user-mode path, just because there otherwise _will_ be
   bugs.

and you're now done. Tadaa! You have a pre-emptive UP kernel. Add a few
months of debugging (because something _will_ crop up, or my name isn't
Billy-Bob).

NOTE NOTE NOTE! You must NOT change the SMP case at all, including the
"are we in the kernel" test. Not only do we not have a global
spinlock_count (and we don't want one - it would be cache-line death), but
even if we used the above heuristic it would be seriously wrong on SMP,
because it would mean that anything that caches the value of "current CPU"
would need to lock. Which is just too expensive to even think about,
because it happens all over the place. On UP, that just isn't a problem ;)

There probably are numerous nasty small details that would crop up, but
I'd give it a 15% chance of just working on the first try.

Oh, and it's not going to be really really efficient. It's going to
increment and decrement global_spinlock_count a lot more than strictly
necessary, but any "clever" approach is just going to be too painful to
think about, and would make the UP locking too different from the SMP
case.

You can do micro-optimizations like

 - spin_[un]lock_[irq|saveirq|restoreirq]() do not need to touch the
   spinlock_count, because anybody who has interrupts disabled won't get
   re-scheduled anyway.

but doing those kinds of clever things will mean that you'd better make
sure that nobody does the equivalent of

	spin_lock_irq(..);	/* count optimized away */
	...
	spin_unlock(..);	/* count NOT optimized away .. */
	..
	__sti()

which probably does happen right now in the scheduler and stuff..

		Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/