[LWN Logo]

Subject: BadRAM patch: Tested, improved
To:	linux-kernel@vger.rutgers.edu, hildebrj@airwire.com,
Date:	Sat, 8 Apr 2000 14:48:27 +0200 (CEST)
From:	"Rick van Rein" <vanrein@zonnet.nl>

Hi!

BadRAM is my kernel patch to handle broken RAM modules gracefully under Linux.
I received *many* positive replies to this patch, and several hints.

This is V3 of the patch, which I think is quite mature now, safe for inclusion
in kernel 2.2.14. Furthermore, I'll try to summarise the private discussions.

Changes:
 * A minor bug on a default-added mask parameter was reported and repaired.
   This bug caused a boot-time kernel freeze for an odd count of badram-parms.
 * BadRAM now comes as a kernel compilation option for i386: CONFIG_BADRAM.
 * Changed PG_badram bit to avoid overlap with PG_bigmem.
 * (not in the kernel) I extended memtest86, an excellent RAM checker by Chris
   Brady, to print the badram= boot options for my patch. Saves a lot of binary
   calculations to come up with the patterns! Memtest86 can boot from LILO. 
 * (not in the kernel) I had to patch LILO 0.21 to avoid cutting off command
   lines after the 78th character, while kernel accepts 255!

Todo:
 * Use lmbench to prove that BadRAM has *no* performance impact at all.
 * Incorporate non-i386 support as soon as owners of such machines help out.
 * See if a 15M-16M hole can be ignored/handler/... with this patch.
 * Seek integration with either 2.3.x or 2.4.0.

Discussions:
 * Build a RAM checker into the kernel. Not done, would be hard to cover all.
   The boot alternative for memtest86 from LILO seems to be the best solution.
 * Concentrate on the matter of statically damaged RAMs instead of manufactured
   mistakes. I think BadRAM can do both.
 * Environmental. Fall-out in a mature chip baking process is 10-60%, exp 50-95%
   RAM chips may be different, they incorporate some hardware redundancy.
   Either way, chip production requires loads of toxic and energy. Every penny
   saved is worth a dime (to me).
 * Laptops sometimes have RAM soldered in. If it fails... throw away your
   laptop or apply BadRAM. Quite a money-saver there!

References:
 * BadRAM home page                         http://home.zonnet.nl/vanrein/badram
 * Memtest86 home page            http://reality.sgi.com/cbrady_denver/memtest86
 * BadRAM patch for 2.2.14  http://home.zonnet.nl/vanrein/badram/badram-patch.V3
 * LILO 0.21 micropatch     http://home.zonnet.nl/vanrein/badram/lilo-0.21-patch
 * Memtest86 2.2 patch  http://home.zonnet.nl/vanrein/badram/memtest86-2.2-patch

Acknowledgements:
 * Val Henson for digging up figures on drop-out of chip manufacture.
 * Jeff Hildebrand for lots of testing remarks and the odd-parms bug report.
 * Uwe Klein for remarks on PG_bigmem and the odd-parms bug report.
 * Rogier Wolff for pointing out that modern RAMs have on-chip redundancy.
 * Peter F. Curran for pointing out the 15M-16M hole thing.
 * Dennis Vierkant for helping me to testing material: Broken DIMMs.
 * Luuk van der Duim and many others for keeping me enthousiastic about BadRAM.
 * Y'all for Linux. It's more stable to hack than using Word under other OS's!

Enjoy,
 -Rick.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/