a/thunder

Date:	Wed, 5 May 1999 14:54:40 -0400 (EDT)
From:	Phillip Ezolt <ezolt@perf.zko.dec.com>
To:	linux-kernel@vger.rutgers.edu
Subject: Overscheduling DOES happen with high web server load.

Hi all,

In doing some performance work with SPECWeb96 on ALpha/Linux with apache,
it looks like "schedule" is the main bottleneck. 

(Kernel v2.2.5, Apache 1.3.4, egcs-1.1.1, iprobe-4.1)

When running a SPECWeb96 strobe run on Alpha/linux, I found that when the
CPU is pegged, 18% of the time is spent in the scheduler.

Using Iprobe, I got the following function breakdown: (only functions >1%
are shown)

Begin            End                                    Sample Image Total
Address          Address          Name                   Count   Pct   Pct
-------          -------          ----                   -----   ---   ---
0000000000000000-00000000000029FC /usr/bin/httpd        127463        18.5 
00000001200419A0-000000012004339F   ap_vformatter        15061  11.8   2.2 
FFFFFC0000300000-00000000FFFFFFFF vmlinux               482385        70.1 
FFFFFC00003103E0-FFFFFC000031045F   entInt                7848   1.6   1.1
FFFFFC0000315E40-FFFFFC0000315F7F   do_entInt            48487  10.1   7.0
FFFFFC0000327A40-FFFFFC0000327D7F   schedule            124815  25.9  18.1
FFFFFC000033FAA0-FFFFFC000033FCDF   kfree                 7876   1.6   1.1
FFFFFC00003A9960-FFFFFC00003A9EBF   ip_queue_xmit         8616   1.8   1.3
FFFFFC00003B9440-FFFFFC00003B983F   tcp_v4_rcv           11131   2.3   1.6
FFFFFC0000441CA0-FFFFFC000044207F   do_csum_partial      43112   8.9   6.3 
                                    _copy_from_user 

I can't pin it down to the exact source line, but the cycles are spent in
close proximity of one another. 

FFFFFC0000327A40 schedule vmlinux
FFFFFC0000327C1C   01DC      2160 (  1.7) *
FFFFFC0000327C34   01F4     28515 ( 22.8) **********************
FFFFFC0000327C60   0220      1547 (  1.2) *
FFFFFC0000327C64   0224     26432 ( 21.2) *********************
FFFFFC0000327C74   0234     36470 ( 29.2) *****************************
FFFFFC0000327C9C   025C     24858 ( 19.9) *******************       

(For those interested, I have the disassembled code. )

Apache has a fairly even cycle distribution, but in the kernel, 'schedule' 
really sticks out as the CPU burner. 

I think that the linear search for next runnable process is where time is
being spent. 

As an independent test, I ran vmstat while SPECWeb was running.

The leftmost column is the number of processes waiting to run.  These number
are above the 3 or 4 that are normally quoted. 


 procs                  memory    swap        io    system         cpu
 r b w  swpd  free  buff cache  si  so   bi   bo   in   cs  us  sy  id
 0 21 0   208  5968  5240 165712   0   0 4001  303 10263 6519  31  66   4
26 27 1   208  6056  5240 165848   0   0 2984   96 5623 3440  29  60  11
 0 15 0   208  5096  5288 166384   0   0 4543  260 10850 7346  32  66   3
 0 17 0   208  6928  5248 164936   0   0 5741  309 13129 8052  32  65   3
37 19 1   208  5664  5248 166144   0   0 2502  142 6837 3896  33  63   5
 0 14 0   208  5984  5240 165656   0   0 3894  376 12432 7276  32  65   3
 0 19 1   208  4872  5272 166248   0   0 2247  124 7641 4514  32  64   4
 0 17 0   208  5248  5264 166336   0   0 4229  288 8786 5144  31  67   2
56 16 1   208  6512  5248 165592   0   0 2159  205 8098 4641  32  62   6
94 18 1   208  5920  5248 165896   0   0 1745  191 5288 2952  32  60   7
71 14 1   208  5920  5256 165872   0   0 2063  160 6493 3729  30  62   8
 0 25 1   208  5032  5256 166544   0   0 3008  112 5668 3612  31  60   9
62 22 1   208  5496  5256 165560   0   0 2512  109 5661 3392  28  62  11
43 22 1   208  4536  5272 166112   0   0 3003  202 7198 4813  30  63   7
 0 26 1   208  4800  5288 166256   0   0 2407   93 5666 3563  29  60  11
32 17 1   208  5984  5296 165632   0   0 2046  329 7296 4305  31  62   6
23 7 1   208  6744  5248 164904   0   0 1739  284 9496 5923  33  65   2
14 18 1   208  5128  5272 166416   0   0 3755  322 9663 6203  32  65   3
 0 22 1   208  4256  5304 167288   0   0 2593  156 5678 3219  31  60   9
44 20 1   208  3688  5264 167184   0   0 3010  149 7277 4398  31  62   7
29 24 1   208  5232  5264 166248   0   0 1954  104 5687 3496  31  61   9
26 23 1   208  5688  5256 165568   0   0 3029  169 7124 4473  30  60  10
 0 18 1   208  5576  5256 165656   0   0 3395  270 8464 5702  30  63   7      

It looks like the run queue is much longer than expected. 

I imagine this problem is compounded by the number of times "schedule" is
called. 

On a webserver that does not have all of the web pages in memory, an httpd
processes life is the following:

1. Wake up for a request from the network.
2. Figure out what web page to load.
3. Ask the disk for it.
4. Sleep (Schedule()) until the page is ready.

This means that schedule will be called alot. In addition a process will wake 
and sleep in a time much shorter than its allotted time slice. 

Each time we schedule, we have to walk through the entire run queue. This will
cause less requests to be serviced.  This will cause more processes to be stuck
on the run queue,  this will make the walk down the runqueue even longer...

Bottom line, under a heavy web load, the linux kernel seems to spend and
unnecessary amount of time scheduling processes.

Is it necessary to calculate the goodness of every process at every schedule?
Can't we make the goodnesses static?  Monkeying with the scheduler is big 
business, and I realize that this will not be a v2.2 issue, but what about 
v2.3? 

--Phil

Digital/Compaq:                     HPSD/Benchmark Performance Engineering
Phillip.Ezolt@compaq.com                            ezolt@perf.zko.dec.com

ps. <shameless plug> For those interested in more detail there will be a
WIP paper describing this work presented at Linux Expo. </shameless plug>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/