From: Andrew Tridgell <tridge@valinux.com> To: linux-kernel@vger.kernel.org Subject: 2.4.8preX VM problems Date: Tue, 31 Jul 2001 20:05:20 -0700 (PDT) I've been testing the 2.4.8preX kernels on machines with fairly large amounts of memory (greater than 1G) and have found them to have disasterously bad performance through the buffer cache. If the machine has 900M or less then it performs well, but above that the performance drops through the floor (by about a factor of 600). To see the effect use this: ftp://ftp.samba.org/pub/unpacked/junkcode/readfiles.c and this: ftp://ftp.samba.org/pub/unpacked/junkcode/trd/ then do this: insmod dummy_disk.o dummy_size=80000000 mknod /dev/ddisk b 241 0 readfile /dev/ddisk "dummy_disk" is a dummy disk device (in this case iits 80G). All IOs to the device succeed, but don't actually do anything. This makes it easy to test very large disks on a small machine, and also eliminates interactions with particular block devices. It also allows you to unload the disk, which means you can easily start again with a clear buffer cache. You can see exactly the same effect with a real device if you would prefer not to load the dummy disk driver. You will see that the speed is good for the first 800M then drops off dramatically after that. Meanwhile, kswapd and kreclaimd go mad chewing lots of cpu. If you boot the machine with "mem=900M" then the problem goes away, with the performance staying high. If you boot with 950M or above then the throughput plummets once you have read more than 800M. Here is a sample run with 2.4.8pre3: [root@fraud trd]# ~/readfiles /dev/ddisk 211 MB 211.754 MB/sec 404 MB 192.866 MB/sec 579 MB 175.188 MB/sec 742 MB 163.017 MB/sec 794 MB 49.5844 MB/sec 795 MB 0.971527 MB/sec 796 MB 0.94948 MB/sec 797 MB 1.35205 MB/sec 799 MB 1.30931 MB/sec 800 MB 1.16104 MB/sec 801 MB 1.30607 MB/sec 803 MB 1.67914 MB/sec 804 MB 1.1175 MB/sec 805 MB 0.645805 MB/sec 806 MB 0.749738 MB/sec 806 MB 0.555384 MB/sec 807 MB 0.330456 MB/sec 807 MB 0.320096 MB/sec 807 MB 0.320502 MB/sec 808 MB 0.33026 MB/sec and on a real disk: [root@fraud trd]# ~/readfiles /dev/rd/c0d1p2 37 MB 37.5002 MB/sec 76 MB 38.8103 MB/sec 115 MB 38.8753 MB/sec 153 MB 37.6465 MB/sec 191 MB 38.223 MB/sec 229 MB 38.276 MB/sec 267 MB 38.3151 MB/sec 305 MB 37.3374 MB/sec 343 MB 37.6915 MB/sec 380 MB 37.7198 MB/sec 418 MB 37.5222 MB/sec 455 MB 37.1729 MB/sec 492 MB 37.2008 MB/sec 529 MB 36.2474 MB/sec 565 MB 36.7173 MB/sec 602 MB 36.6197 MB/sec 639 MB 36.5568 MB/sec 675 MB 36.4935 MB/sec 711 MB 36.1575 MB/sec 747 MB 36.0858 MB/sec 784 MB 36.1972 MB/sec 799 MB 15.1778 MB/sec 803 MB 4.11846 MB/sec 804 MB 1.33881 MB/sec 805 MB 0.927079 MB/sec 806 MB 0.790508 MB/sec 807 MB 0.679455 MB/sec 807 MB 0.316194 MB/sec 808 MB 0.305104 MB/sec 808 MB 0.317431 MB/sec Interestingly, the 800M barrier is the same no matter how much memory is in the machine (ie. its the same barrier for a machine with 2G as 1G). So, anyone have any ideas? I was prompted to do these tests when I saw kswapd and kreclaimd going mad in large SPECsfs runs on a machine with 2G of memory. I suspect that what is happening is that the meta data throughput plummets during the runs when the buffer cache reaches 800M in size. SPECsfs is very meta-data intensive. Typical runs will create millions of files. Cheers, Tridge - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/