[LWN Logo]
[LWN.net]
Date:	Tue, 10 Apr 2001 12:28:59 -0700
From:	george anzinger <george@mvista.com>
To:	Jamie Lokier <lk@tantalophile.demon.co.uk>,
Subject: Re: No 100 HZ timer !

Just for your information we have a project going that is trying to come
up with a good solution for all of this:

http://sourceforge.net/projects/high-res-timers

We have a mailing list there where we have discussed much of the same
stuff.  The mailing list archives are available at sourceforge.

Lets separate this into findings and tentative conclusions :)

Findings:

a) The University of Kansas and others have done a lot of work here.

b) High resolution timer events can be had with or without changing HZ.

c) High resolution timer events can be had with or without eliminating
the 1/HZ tick.

d) The organization of the timer list should reflect the existence of
the 1/HZ tick or not.  The current structure is not optimal for a "tick
less" implementation.  Better would be strict expire order with indexes
to "interesting times".

e) The current organization of the timer list generates a hiccup every
2.56 seconds to handle "cascading".  Hiccups are bad.

f) As noted, the account timers (task user/system times) would be much
more accurate with the tick less approach.  The cost is added code in
both the system call and the schedule path.  

Tentative conclusions:

Currently we feel that the tick less approach is not acceptable due to
(f).  We felt that this added code would NOT be welcome AND would, in a
reasonably active system, have much higher overhead than any savings in
not having a tick.  Also (d) implies a list organization that will, at
the very least, be harder to understand.  (We have some thoughts here,
but abandoned the effort because of (f).)  We are, of course, open to
discussion on this issue and all others related to the project
objectives.

We would reorganize the current timer list structure to eliminate the
cascade (e) and to add higher resolution entries.  The higher resolution
entries would carry an addition word which would be the fraction of a
jiffie that needs to be added to the jiffie value for the timer.  This
fraction would be in units defined by the platform to best suit the sub
jiffie interrupt generation code.  Each of the timer lists would then be
ordered by time based on this sub jiffie value.  In addition, in order
to eliminate the cascade, each timer list would carry all timers for
times that expire on the (jiffie mod (size of list)).  Thus, with the
current 256 first order lists, all timers with the same (jiffies & 255)
would be in the same list, again in expire order.  We also think that
the list size should be configurable to some power of two.  Again we
welcome discussion of these issues.

George

Alan Cox wrote:

>> It's also all interrupts, not only syscalls, and also context switch if you
>> want to be accurate.

>We dont need to be that accurate. Our sample rate is currently so low the
>data is worthless anyway
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/