[LWN Logo]
[LWN.net]
From:	 Ingo Molnar <mingo@elte.hu>
To:	 Linus Torvalds <torvalds@transmeta.com>
Subject: [patch] swap-speedup-2.4.3-A1, massive swapping speedup
Date:	 Mon, 23 Apr 2001 11:20:38 +0200 (CEST)
Cc:	 Alan Cox <alan@lxorguk.ukuu.org.uk>,
	 Linux Kernel List <linux-kernel@vger.kernel.org>,
	 <linux-mm@kvack.org>, Marcelo Tosatti <marcelo@conectiva.com.br>,
	 Rik van Riel <riel@conectiva.com.br>,
	 Szabolcs Szakacsits <szaka@f-secure.com>


background: sometime during the 2.4 cycle swapping performance and the
efficiency of the swapcache dropped significantly. I believe i finally
managed to find the problem that caused poor swapping performance and
annoying 'sudden process stoppage' symptoms.

(to those people who are experiencing swapping problems in 2.4: please try
the attached swap-speedup-2.4.3-A1 patch and let me know whether it works
& helps. The patch is against 2.4.4-pre6 or 2.4.3-ac12 and works just fine
on UP and SMP systems here.)

the problem is in lookup_swap_cache(). Per design it's a read-only,
lightweight function that just looks up the swapcache and reestablishes
the mapping if there is an entry still present. (ie. in most cases this
means that there is a fresh swapout still pending). In reality
lookup_swap_cache() did not work as intended: pages were locked in most of
the cases due to swap-out WRITEs, which caused the find_lock_page() to act
as a synchronization point - it blocked until the writeout finished. (!!!)
This is highly inefficient and undesirable - a lookup cache should not and
must not serialize with writeouts.

the reason why lookup_swap_cache() locks the page is due to a valid race,
but the solution excessive: it tries to keep the lookup atomic against
destroyers of the page, page_launder() and reclaim_page(). But it does not
really need the page lock for this - what it needs is atomicity against
swapcache updates. The same atomicity can be achieved by taking the LRU
and pagecache locks, the PageSwapCache() check is now embedded in a new
function: __find_get_swapcache_page().

the patch dramatically improves swapping performance in the tests i've
tried: swap-trashing tests that used to effectively lock the system up,
are chugging along just fine now, and the system is still more than
usable. The performance bug basically killed all good effects of the
swapcache. Swap-in latency of swapped-out processes has decreased
significantly, and overall swapping throughput has increased and
stabilized.

I'd really like to ask all MM developers to take some time to lean back
and verify current code to find similar performance bugs, instead of
trying to hack up new functionality to hide symptoms of bad design or bad
implementation. (for example there are some plans to add "avoid trashing
via process suspension" heuristics, which just work around real problems
like this one. With such code in place i'd probably never have found this
problem.) I believe we have most of the VM functionality in place to have
a world-class VM (most of which is new), what we now need is reliable and
verified behavior, not more complexity.

[ i'd also like to ask for new methods to create 'swap trashing', right
now i cannot make my system unusable via excessive swapping. (i'm
currently using the sieve.c memory trasher from Cesar Eduardo Barros, this
code used to produce the most extreme trashing load - it works just fine
now.) ]

	Ingo

--- linux/mm/filemap.c.orig	Mon Apr 23 13:35:06 2001
+++ linux/mm/filemap.c	Mon Apr 23 13:37:09 2001
@@ -710,6 +710,34 @@
 }
 
 /*
+ * Find a swapcache page (and get a reference) or return NULL.
+ * The SwapCache check is protected by the pagecache lock.
+ */
+struct page * __find_get_swapcache_page(struct address_space *mapping,
+			      unsigned long offset, struct page **hash)
+{
+	struct page *page;
+
+	/*
+	 * We need the LRU lock to protect against page_launder().
+	 */
+	spin_lock(&pagecache_lock);
+	spin_lock(&pagemap_lru_lock);
+
+	page = __find_page_nolock(mapping, offset, *hash);
+	if (page) {
+		if (PageSwapCache(page))
+			page_cache_get(page);
+		else
+			page = NULL;
+	}
+	spin_unlock(&pagemap_lru_lock);
+	spin_unlock(&pagecache_lock);
+
+	return page;
+}
+
+/*
  * Same as the above, but lock the page too, verifying that
  * it's still valid once we own it.
  */
--- linux/mm/swap_state.c.orig	Mon Apr 23 13:35:06 2001
+++ linux/mm/swap_state.c	Mon Apr 23 13:37:01 2001
@@ -163,37 +163,18 @@
 		/*
 		 * Right now the pagecache is 32-bit only.  But it's a 32 bit index. =)
 		 */
-repeat:
-		found = find_lock_page(&swapper_space, entry.val);
+		found = find_get_swapcache_page(&swapper_space, entry.val);
 		if (!found)
 			return 0;
-		/*
-		 * Though the "found" page was in the swap cache an instant
-		 * earlier, it might have been removed by refill_inactive etc.
-		 * Re search ... Since find_lock_page grabs a reference on
-		 * the page, it can not be reused for anything else, namely
-		 * it can not be associated with another swaphandle, so it
-		 * is enough to check whether the page is still in the scache.
-		 */
-		if (!PageSwapCache(found)) {
-			UnlockPage(found);
-			page_cache_release(found);
-			goto repeat;
-		}
+		if (!PageSwapCache(found))
+			BUG();
 		if (found->mapping != &swapper_space)
-			goto out_bad;
+			BUG();
 #ifdef SWAP_CACHE_INFO
 		swap_cache_find_success++;
 #endif
-		UnlockPage(found);
 		return found;
 	}
-
-out_bad:
-	printk (KERN_ERR "VM: Found a non-swapper swap page!\n");
-	UnlockPage(found);
-	page_cache_release(found);
-	return 0;
 }
 
 /* 
--- linux/include/linux/pagemap.h.orig	Mon Apr 23 13:35:05 2001
+++ linux/include/linux/pagemap.h	Mon Apr 23 13:37:01 2001
@@ -77,7 +77,12 @@
 				unsigned long index, struct page **hash);
 extern void lock_page(struct page *page);
 #define find_lock_page(mapping, index) \
-		__find_lock_page(mapping, index, page_hash(mapping, index))
+	__find_lock_page(mapping, index, page_hash(mapping, index))
+
+extern struct page * __find_get_swapcache_page (struct address_space * mapping,
+				unsigned long index, struct page **hash);
+#define find_get_swapcache_page(mapping, index) \
+	__find_get_swapcache_page(mapping, index, page_hash(mapping, index))
 
 extern void __add_page_to_hash_queue(struct page * page, struct page **p);