Date: Thu, 11 Nov 1999 09:53:12 -0600 From: David Winchell <winchell@missioncriticallinux.com> To: "linux-kernel@vger.rutgers.edu" <linux-kernel@vger.rutgers.edu> Subject: Memory based kernel crash dump Hello, I have been working on a kernel crash dump that does not rely on the disk subsystems during the crash. Instead, the crash is saved in memory at crash time and then saved to a file on the subsequent boot. The save at crash time is accomplished by selecting pages that are not [free, user anon, user shared, file page cache] and compressing them into pages that are above a certain address, a certain distance from the end of memory, not locked, and are members of [free, user anon, user shared, file page cache]. A reboot is then requested with the option to preserve memory. Early in the boot process, the non-contiguous pages containing the dump are copied to contiguous pages at the end of memory. Later in the boot process, they are written to a file and freed. On a 96M machine the size of the compressed dump was 4M. Scratch memory is saved at boot time for crash dump use. I use about 2M for this, though smaller amounts can be tuned. This ensures that a dump can be taken even with very low free memory conditions. For example, here is a stack trace of a crash in interrupt context, a case that can be difficult for disk based solutions: crash> bt PID: 286 TASK: c0b3a000 CPU: 0 COMMAND: "in.rlogind" #0 [c0b3be90] crash_save_current_state at c011aed0 (c0b3a000,c08e4190,4000001,c0b3bee8,tulip_interrupt+0x2c) #1 [c0b3bea4] panic+0xac at c011367c (media_cap+0x1446,c08e4190,4000001,9,5a8) #2 [c0b3bee8] tulip_interrupt+0x2c at c01bc820 (9,eth0_dev,c0b3bf44,irq_desc+0x90,9) #3 [c0b3bf08] handle_IRQ_event+0x2d at c010a551 (9,c0b3bf44,c08e4190) #4 [c0b3bf2c] do_8259A_IRQ+0x75 at c010a319 (9,c0b3bf44,c0b3bfbc,ret_from_intr,c0e68280) #5 [c0b3bf3c] do_IRQ+0x23 at c010a653 (c0e68280,0,4,4,c0e68284) #6 [c0b3bfbc] ret_from_intr at c0109634 (4,bfffc9a0,0,bfffc8a0,0) #7 [bfffd224] system_call+0x34 at c0109598 For this test crash I set a flag with a system call which instructed the tulip interrupt handler to call panic(). Now the request for help. Some BIOS (Dell, NEC) clear memory on reboot even when the flags to not test or to preserve are set. Others (HP) do not clear memory. Can someone point me to BIOS developers at Dell or Phoenix or other manufacturers so that I can lobby for a flag that I can pass to the BIOS so that it will preserve the contents of memory? If anyone is interested in trying my code I'd be glad to make it available today or tomorrow. thanks Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/