Re: CPU affinity & IPI latency

From:	 David Lang <david.lang@digitalinsight.com>
To:	 Mike Kravetz <mkravetz@sequent.com>
Subject: Re: CPU affinity & IPI latency
Date:	 Fri, 13 Jul 2001 12:51:53 -0700 (PDT)
Cc:	 Larry McVoy <lm@bitmover.com>,
	 Davide Libenzi <davidel@xmailserver.org>,
	 lse-tech@lists.sourceforge.net, Andi Kleen <ak@suse.de>,
	 linux-kernel@vger.kernel.org

A real-world example of this issue.

I was gzipping a large (~800MB) file on a dual athlon box. the gzip prcess
was bouncing back and forth between the two CPUs. I actually was able to
gzip faster by starting up setiathome to keep one CPU busy and force the
scheduler to keep the gzip on a single CPU (I ran things several times to
verify it was actually faster)

David Lang


 On Fri, 13 Jul 2001, Mike Kravetz wrote:

> Date: Fri, 13 Jul 2001 10:05:21 -0700
> From: Mike Kravetz <mkravetz@sequent.com>
> To: Larry McVoy <lm@bitmover.com>
> Cc: Davide Libenzi <davidel@xmailserver.org>, lse-tech@lists.sourceforge.net,
>      Andi Kleen <ak@suse.de>, linux-kernel@vger.kernel.org
> Subject: Re: CPU affinity & IPI latency
>
> On Thu, Jul 12, 2001 at 05:36:41PM -0700, Larry McVoy wrote:
> > Be careful tuning for LMbench (says the author :-)
> >
> > Especially this benchmark.  It's certainly possible to get dramatically better
> > SMP numbers by pinning all the lat_ctx processes to a single CPU, because
> > the benchmark is single threaded.  In other words, if we have 5 processes,
> > call them A, B, C, D, and E, then the benchmark is passing a token from
> > A to B to C to D to E and around again.
> >
> > If the amount of data/instructions needed by all 5 processes fits in the
> > cache and you pin all the processes to the same CPU you'll get much
> > better performance than simply letting them float.
> >
> > But making the system do that naively is a bad idea.
>
> I agree, and can't imagine the system ever attempting to take this
> into account and leave these 5 tasks on the same CPU.
>
> At the other extreme is my observation that 2 tasks on an 8 CPU
> system are 'round robined' among all 8 CPUs.  I think having the
> 2 tasks stay on 2 of the 8 CPUs would be an improvement with respect
> to CPU affinity.  Actually, the scheduler does 'try' to do this.
>
> It is clear that the behavior of lat_ctx bypasses almost all of
> the scheduler's attempts at CPU affinity.  The real question is,
> "How often in running 'real workloads' are the schduler's attempts
> at CPU affinity bypassed?".
>
> --
> Mike Kravetz                                 mkravetz@sequent.com
> IBM Linux Technology Center
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/