From: "Randy.Dunlap" <rddunlap@osdl.org> To: <linux-kernel@vger.kernel.org> Subject: [patch 2.5.8] bounce/swap stats Date: Thu, 11 Apr 2002 18:21:25 -0700 (PDT) Cc: <axboe@suse.de>, <andrea@suse.de> Hi, This patch adds stats for all bounce I/O and bounce swap I/O to /proc/stats . I've been testing bounce I/O and VM performance in 2.4.teens with the highio patch and in 2.5.x. Summary: * 2.5.8-pre3 with highio runs to completion with an intense workload * 2.5.8-pre3 with "nohighio" and same workload dies * 2.5.8-pre3 with "nohighio" and less workload runs [attachments contain /proc/stat for completed runs] Here's the patch. Jens, please apply to 2.5.8-N. --- ./fs/proc/proc_misc.c.org Thu Jan 3 09:16:31 2002 +++ ./fs/proc/proc_misc.c Tue Jan 8 16:12:56 2002 @@ -324,6 +324,12 @@ xtime.tv_sec - jif / HZ, total_forks); + len += sprintf(page + len, + "bounce_io %u %u\n" + "bounce_swap_io %u %u\n", + kstat.bouncein, kstat.bounceout, + kstat.bounceswapin, kstat.bounceswapout); + return proc_calc_metrics(page, start, off, count, eof, len); } --- ./mm/highmem.c.org Thu Jan 3 09:16:31 2002 +++ ./mm/highmem.c Tue Jan 8 16:16:51 2002 @@ -21,6 +21,7 @@ #include <linux/mempool.h> #include <linux/blkdev.h> #include <asm/pgalloc.h> +#include <linux/kernel_stat.h> static mempool_t *page_pool, *isa_page_pool; @@ -401,7 +401,10 @@ vfrom = kmap(from->bv_page) + from->bv_offset; memcpy(vto, vfrom, to->bv_len); kunmap(from->bv_page); + kstat.bounceout++; } + else + kstat.bouncein++; } /* --- ./include/linux/kernel_stat.h.org Thu Jan 3 09:28:04 2002 +++ ./include/linux/kernel_stat.h Tue Jan 8 16:10:20 2002 @@ -26,6 +26,8 @@ unsigned int dk_drive_wblk[DK_MAX_MAJOR][DK_MAX_DISK]; unsigned int pgpgin, pgpgout; unsigned int pswpin, pswpout; + unsigned int bouncein, bounceout; + unsigned int bounceswapin, bounceswapout; #if !defined(CONFIG_ARCH_S390) unsigned int irqs[NR_CPUS][NR_IRQS]; #endif --- ./mm/page_io.c.orig Tue Apr 9 14:54:02 2002 +++ ./mm/page_io.c Tue Apr 9 16:18:18 2002 @@ -10,11 +10,13 @@ * Always use brw_page, life becomes simpler. 12 May 1998 Eric Biederman */ +#include <linux/config.h> #include <linux/mm.h> #include <linux/kernel_stat.h> #include <linux/swap.h> #include <linux/locks.h> #include <linux/swapctl.h> +#include <linux/blkdev.h> #include <asm/pgtable.h> @@ -41,6 +43,7 @@ int block_size; struct inode *swapf = 0; struct block_device *bdev; + kdev_t kdev; if (rw == READ) { ClearPageUptodate(page); @@ -54,6 +57,7 @@ zones[0] = offset; zones_used = 1; block_size = PAGE_SIZE; + kdev = swapf->i_rdev; } else { int i, j; unsigned int block = offset @@ -67,6 +71,19 @@ } zones_used = i; bdev = swapf->i_sb->s_bdev; + kdev = swapf->i_sb->s_dev; + } + + { + request_queue_t *q = blk_get_queue(kdev); /* TBD: is kdev always correct here? */ + zone_t *zone = page_zone(page); + if (q && (page - zone->zone_mem_map) + (zone->zone_start_paddr + >> PAGE_SHIFT) >= q->bounce_pfn) { + if (rw == WRITE) + kstat.bounceswapout++; + else + kstat.bounceswapin++; + } } /* block_size == PAGE_SIZE/zones_used */ I'll keep looking into the "kernel dies" problem(s) that I'm seeing [using more tools], but I have some data and a patch for 2.5.8 concerning bounce I/O and bounce swap statistics that I would like to have integrated so that both users and developers can have more insight into how much bounce I/O is happening. I'll generate the patch for 2.4.teens + highmem if anyone is interested in it, or after highmem is merged into 2.4. ...it will be added to 2.4, right? There is a second patch (attached) that prints the device major:minor of devices that are being used for bounce I/O [258-bounce-identify.patch]. This is a user helper, not intended for kernel inclusion. Some of the symptoms that I usually see with the most memory-intensive workloads are: . 'top' reports that all 8 processors are in system exec. state at 98-99% level and 'top' display is only being updated every few minutes (should be updated every 1 second) . Magic SysRq does not work when all 8 CPUs are tied up in system mode . there is a looping script running (with 'sleep 1') that prints the last 50 lines of 'dmesg', but it often doesn't print for 10-20 minutes and then finally comes back to life . I usually don't see a kernel death, just lack of response or my sessions to the test system dies. Comments? Thanks, ~Randy [2. text/plain; 258-bounce-identify.patch]... [3. text/plain; 258p3hi.txt]... [4. text/plain; 258p3nohi.txt]...