a/sct-vm

Date:	Mon, 25 Jan 1999 16:25:41 GMT
From:	"Stephen C. Tweedie" <sct@redhat.com>
To:	Linus Torvalds <torvalds@transmeta.com>
Subject: Re: MM deadlock [was: Re: arca-vm-8...]

Hi,

On Sat, 23 Jan 1999 16:19:13 -0800 (PST), Linus Torvalds
<torvalds@transmeta.com> said:

> Complexity is not a goal to be reached. Complexity is something to be
> avoided at all cost. If you don't believe me, look at NT.

Nevertheless, the 2.2.0-pre9 VM sucks.  I've been getting seriously
frustrated at pre-9's interactive feel over the past few days.

Linus, there really are fundamental problems remaining in the VM in
2.2.0-pre right now.  The two biggest are the lack of responsiveness
of kswapd and a general misbalance in the cache management.

The kswapd in pre9 is improved, but still only checks status at 1HZ.
Once we detect an out-of-memory condition, then yes, we increase that
frequency, but it means we can take a long time to start responding to
(say) a burst of network traffic, and the free list can be completely
exhausted long before kswapd notices.

The second, balancing issue is evident in a general all-round
performance degradation when under VM load.  I notice this on 64M and on
8M.  Interactive response is simply poor all over, and build times are
excessive especially in low memory configurations.

Regarding the former, is there any chance you'd consider adding a kswapd
wakeup when low_memory gets set in get_free_pages()?  Being able to
respond to a burst in network traffic without locking up is not exactly
a minor issue.

As for the balancing, the tiny patch below seems to completely restore
the responsiveness and throughput of the VM we had in the 132-ac*
kernels.  8MB builds are very much faster.  Responsiveness on memory
sizes up to 64MB is improved both when we have several competing tasks
running and when simply switching between applications.  vmstat shows
swapouts clustered well: I see between 3 and 6 times the swap throughput
that pre9 alone has, and swap bursts end in about a quarter of the time
as under plain pre9.

The changes are very similar to the self-tuning clock counter we had in
those ac* vms.  The modified shrink_mmap() just avoids decrementing the
count for locked, non-DMA (if GFP_DMA) or multiply-mapped pages.  The
effect is to avoid counting memory mapped pages when we trim cache.  In
low memory, this allows us to keep trimming back the "excess" unmapped
pure cache pages even if a large fraction of physical memory is occupied
by mapped pages.  

Right now, on my 64MB box this kernel is so much more responsive than
pre9 that it is scary.  Ditto 8MB.  Kernel builds also now proceed
without excessive cache trimming: even pre9 used to show large amounts
of disk read activity as the include file working set got tossed from
cache, but halving the "count" limit as below is enough to eliminate
that entirely.  The new limit also has the side effect of allowing
swapout to stream much more effectively, without any signs of the cache
growing to excess.  Sustained IO activity grows the cache to about the
same size as in previous kernels.

Up to you, take it or leave it, but right now one of the major benefits
we are touting for 2.2 over 2.0 is performance, and people will expect
2.2.0's performance to be representative of the 2.2.* series.  Right now
we are way behind the 131+ kernels on that front.

--Stephen

----------------------------------------------------------------
--- mm/filemap.c.~1~	Thu Jan 21 10:26:41 1999
+++ mm/filemap.c	Mon Jan 25 12:59:38 1999
@@ -125,7 +125,7 @@
 	struct page * page;
 	int count;
 
-	count = (limit << 1) >> priority;
+	count = limit >> priority;
 
 	page = mem_map + clock;
 	do {
@@ -147,7 +147,6 @@
 			clock = page - mem_map;
 		}
 		
-		count--;
 		referenced = test_and_clear_bit(PG_referenced, &page->flags);
 
 		if (PageLocked(page))
@@ -159,6 +158,8 @@
 		/* We can't free pages unless there's just one user */
 		if (atomic_read(&page->count) != 1)
 			continue;
+
+		count--;
 
 		/*
 		 * Is it a page swap page? If so, we want to

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/