[LWN Logo]

Date:   Thu, 11 Nov 1999 09:53:12 -0600
From:   David Winchell <winchell@missioncriticallinux.com>
To:     "linux-kernel@vger.rutgers.edu" <linux-kernel@vger.rutgers.edu>
Subject: Memory based kernel crash dump

Hello,

I have been working on a kernel crash dump that does not rely on the
disk subsystems during the crash.  Instead, the crash is saved in memory at crash time
and then saved to a file on the subsequent boot.

The save at crash time is accomplished by selecting pages that are
not [free, user anon, user shared, file page cache] and compressing them into pages
that are above a certain address, a certain distance from the end of memory,
not locked, and are members of [free, user anon, user shared, file page cache].
A reboot is then requested with the option to preserve memory.
Early in the boot process, the non-contiguous pages containing the dump are copied to
contiguous pages at the end of memory.  Later in the boot process, they are written to a
file and freed.  On a 96M machine the size of the compressed dump was 4M.

Scratch memory is saved at boot time for crash dump use.  I use about 2M for this,
though smaller amounts can be tuned.  This ensures that a dump can be taken even
with very low free memory conditions.

For example, here is a stack trace of a crash in interrupt context,
a case that can be difficult for disk based solutions:


crash> bt
PID: 286  TASK: c0b3a000  CPU: 0  COMMAND: "in.rlogind"
#0 [c0b3be90] crash_save_current_state at c011aed0
   (c0b3a000,c08e4190,4000001,c0b3bee8,tulip_interrupt+0x2c)
#1 [c0b3bea4] panic+0xac at c011367c
   (media_cap+0x1446,c08e4190,4000001,9,5a8)
#2 [c0b3bee8] tulip_interrupt+0x2c at c01bc820
   (9,eth0_dev,c0b3bf44,irq_desc+0x90,9)
#3 [c0b3bf08] handle_IRQ_event+0x2d at c010a551
   (9,c0b3bf44,c08e4190)
#4 [c0b3bf2c] do_8259A_IRQ+0x75 at c010a319
   (9,c0b3bf44,c0b3bfbc,ret_from_intr,c0e68280)
#5 [c0b3bf3c] do_IRQ+0x23 at c010a653
   (c0e68280,0,4,4,c0e68284)
#6 [c0b3bfbc] ret_from_intr at c0109634
   (4,bfffc9a0,0,bfffc8a0,0)
#7 [bfffd224] system_call+0x34 at c0109598

For this test crash I set a flag with a system call which instructed the tulip interrupt
handler to call panic().

Now the request for help.  Some BIOS (Dell, NEC) clear memory on reboot
even when the flags to not test or to preserve are set.  Others (HP) do not clear
memory.  Can someone point me to BIOS developers at Dell or Phoenix or other manufacturers
so that I can lobby for a flag that I can pass to the BIOS so that it will preserve
the contents of memory?

If anyone is interested in trying my code I'd be glad to make it available today
or tomorrow.


thanks
Dave


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/