[LWN Logo]
[LWN.net]
From:	 Andrew Tridgell <tridge@valinux.com>
To:	 linux-kernel@vger.kernel.org
Subject: 2.4.8preX VM problems
Date:	 Tue, 31 Jul 2001 20:05:20 -0700 (PDT)

I've been testing the 2.4.8preX kernels on machines with fairly large
amounts of memory (greater than 1G) and have found them to have
disasterously bad performance through the buffer cache. If the machine
has 900M or less then it performs well, but above that the performance
drops through the floor (by about a factor of 600).

To see the effect use this:

ftp://ftp.samba.org/pub/unpacked/junkcode/readfiles.c

and this:

ftp://ftp.samba.org/pub/unpacked/junkcode/trd/

then do this:

insmod dummy_disk.o dummy_size=80000000
mknod /dev/ddisk b 241 0
readfile /dev/ddisk

"dummy_disk" is a dummy disk device (in this case iits 80G). All IOs
to the device succeed, but don't actually do anything. This makes it
easy to test very large disks on a small machine, and also eliminates
interactions with particular block devices. It also allows you to
unload the disk, which means you can easily start again with a clear
buffer cache. You can see exactly the same effect with a real device
if you would prefer not to load the dummy disk driver.

You will see that the speed is good for the first 800M then drops off
dramatically after that. Meanwhile, kswapd and kreclaimd go mad
chewing lots of cpu.

If you boot the machine with "mem=900M" then the problem goes away,
with the performance staying high. If you boot with 950M or above
then the throughput plummets once you have read more than 800M.

Here is a sample run with 2.4.8pre3:

[root@fraud trd]# ~/readfiles /dev/ddisk 
211 MB    211.754 MB/sec
404 MB    192.866 MB/sec
579 MB    175.188 MB/sec
742 MB    163.017 MB/sec
794 MB    49.5844 MB/sec
795 MB    0.971527 MB/sec
796 MB    0.94948 MB/sec
797 MB    1.35205 MB/sec
799 MB    1.30931 MB/sec
800 MB    1.16104 MB/sec
801 MB    1.30607 MB/sec
803 MB    1.67914 MB/sec
804 MB    1.1175 MB/sec
805 MB    0.645805 MB/sec
806 MB    0.749738 MB/sec
806 MB    0.555384 MB/sec
807 MB    0.330456 MB/sec
807 MB    0.320096 MB/sec
807 MB    0.320502 MB/sec
808 MB    0.33026 MB/sec

and on a real disk:

[root@fraud trd]# ~/readfiles /dev/rd/c0d1p2 
37 MB    37.5002 MB/sec
76 MB    38.8103 MB/sec
115 MB    38.8753 MB/sec
153 MB    37.6465 MB/sec
191 MB    38.223 MB/sec
229 MB    38.276 MB/sec
267 MB    38.3151 MB/sec
305 MB    37.3374 MB/sec
343 MB    37.6915 MB/sec
380 MB    37.7198 MB/sec
418 MB    37.5222 MB/sec
455 MB    37.1729 MB/sec
492 MB    37.2008 MB/sec
529 MB    36.2474 MB/sec
565 MB    36.7173 MB/sec
602 MB    36.6197 MB/sec
639 MB    36.5568 MB/sec
675 MB    36.4935 MB/sec
711 MB    36.1575 MB/sec
747 MB    36.0858 MB/sec
784 MB    36.1972 MB/sec
799 MB    15.1778 MB/sec
803 MB    4.11846 MB/sec
804 MB    1.33881 MB/sec
805 MB    0.927079 MB/sec
806 MB    0.790508 MB/sec
807 MB    0.679455 MB/sec
807 MB    0.316194 MB/sec
808 MB    0.305104 MB/sec
808 MB    0.317431 MB/sec

Interestingly, the 800M barrier is the same no matter how much memory
is in the machine (ie. its the same barrier for a machine with 2G as
1G).

So, anyone have any ideas? 

I was prompted to do these tests when I saw kswapd and kreclaimd going
mad in large SPECsfs runs on a machine with 2G of memory. I suspect
that what is happening is that the meta data throughput plummets
during the runs when the buffer cache reaches 800M in size. SPECsfs is
very meta-data intensive. Typical runs will create millions of files.

Cheers, Tridge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/