[PATCH] Updated kdump documentation

Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: David Wilder <dwilder@us.ibm.com> 2006-06-25 08:47:55 -0400
committer: Linus Torvalds <torvalds@g5.osdl.org> 2006-06-25 13:01:08 -0400
commit: dc851a0fd2736e8dc3e90bd990cb911a0013da67 (patch)
tree: 715f381d67be16d27fd656eb1dd8d5dd3a52c10a /Documentation
parent: 8ea2c2ecfcc1f31eaba8d1995b2e734ba821806a (diff)
1 files changed, 295 insertions, 125 deletions
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 212cf3c21abf..08bafa8c1caa 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -1,155 +1,325 @@
-Documentation for kdump - the kexec-based crash dumping solution
+================================================================
+Documentation for Kdump - The kexec-based Crash Dumping Solution
 ================================================================
-DESIGN
+This document includes overview, setup and installation, and analysis
-======
+information.
-Kdump uses kexec to reboot to a second kernel whenever a dump needs to be
+Overview
-taken. This second kernel is booted with very little memory. The first kernel
+========
-reserves the section of memory that the second kernel uses. This ensures that
-on-going DMA from the first kernel does not corrupt the second kernel.
-All the necessary information about Core image is encoded in ELF format and
+Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
-stored in reserved area of memory before crash. Physical address of start of
+dump of the system kernel's memory needs to be taken (for example, when
-ELF header is passed to new kernel through command line parameter elfcorehdr=.
+the system panics). The system kernel's memory image is preserved across
+the reboot and is accessible to the dump-capture kernel.
-On i386, the first 640 KB of physical memory is needed to boot, irrespective
+You can use common Linux commands, such as cp and scp, to copy the
-of where the kernel loads. Hence, this region is backed up by kexec just before
+memory image to a dump file on the local disk, or across the network to
-rebooting into the new kernel.
+a remote system.
-In the second kernel, "old memory" can be accessed in two ways.
+Kdump and kexec are currently supported on the x86, x86_64, and ppc64
+architectures.
- The first one is through a /dev/oldmem device interface. A capture utility
+When the system kernel boots, it reserves a small section of memory for
-  can read the device file and write out the memory in raw format. This is raw
+the dump-capture kernel. This ensures that ongoing Direct Memory Access
-  dump of memory and analysis/capture tool should be intelligent enough to
+(DMA) from the system kernel does not corrupt the dump-capture kernel.
-  determine where to look for the right information. ELF headers (elfcorehdr=)
+The kexec -p command loads the dump-capture kernel into this reserved
-  can become handy here.
+memory.
- The second interface is through /proc/vmcore. This exports the dump as an ELF
+On x86 machines, the first 640 KB of physical memory is needed to boot,
-  format file which can be written out using any file copy command
+regardless of where the kernel loads. Therefore, kexec backs up this
-  (cp, scp, etc). Further, gdb can be used to perform limited debugging on
+region just before rebooting into the dump-capture kernel.
-  the dump file. This method ensures methods ensure that there is correct
-  ordering of the dump pages (corresponding to the first 640 KB that has been
-  relocated).
-SETUP
+All of the necessary information about the system kernel's core image is
-=====
+encoded in the ELF format, and stored in a reserved area of memory
+before a crash. The physical address of the start of the ELF header is
+passed to the dump-capture kernel through the elfcorehdr= boot
+parameter.
+With the dump-capture kernel, you can access the memory image, or "old
+memory," in two ways:
+- Through a /dev/oldmem device interface. A capture utility can read the
+  device file and write out the memory in raw format. This is a raw dump
+  of memory. Analysis and capture tools must be intelligent enough to
+  determine where to look for the right information.
+- Through /proc/vmcore. This exports the dump as an ELF-format file that
+  you can write out using file copy commands such as cp or scp. Further,
+  you can use analysis tools such as the GNU Debugger (GDB) and the Crash
+  tool to debug the dump file. This method ensures that the dump pages are
+  correctly ordered.
+Setup and Installation
+======================
+Install kexec-tools and the Kdump patch
+---------------------------------------
+1) Login as the root user.
+2) Download the kexec-tools user-space package from the following URL:
+   http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
+3) Unpack the tarball with the tar command, as follows:
+   tar xvpzf kexec-tools-1.101.tar.gz
+4) Download the latest consolidated Kdump patch from the following URL:
+   http://lse.sourceforge.net/kdump/
+   (This location is being used until all the user-space Kdump patches
+   are integrated with the kexec-tools package.)
+5) Change to the kexec-tools-1.101 directory, as follows:
+   cd kexec-tools-1.101
+6) Apply the consolidated patch to the kexec-tools-1.101 source tree
+   with the patch command, as follows. (Modify the path to the downloaded
+   patch as necessary.)
+   patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch
+7) Configure the package, as follows:
+   ./configure
+8) Compile the package, as follows:
+   make
+9) Install the package, as follows:
+   make install
+Download and build the system and dump-capture kernels
+------------------------------------------------------
+Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer)
+from http://www.kernel.org. Two kernels must be built: a system kernel
+and a dump-capture kernel. Use the following steps to configure these
+kernels with the necessary kexec and Kdump features:
+System kernel
+-------------
+1) Enable "kexec system call" in "Processor type and features."
+   CONFIG_KEXEC=y
+2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
+   filesystems." This is usually enabled by default.
+   CONFIG_SYSFS=y
+   Note that "sysfs file system support" might not appear in the "Pseudo
+   filesystems" menu if "Configure standard kernel features (for small
+   systems)" is not enabled in "General Setup." In this case, check the
+   .config file itself to ensure that sysfs is turned on, as follows:
+   grep 'CONFIG_SYSFS' .config
+3) Enable "Compile the kernel with debug info" in "Kernel hacking."
+   CONFIG_DEBUG_INFO=Y
+   This causes the kernel to be built with debug symbols. The dump
+   analysis tools require a vmlinux with debug symbols in order to read
+   and analyze a dump file.
+4) Make and install the kernel and its modules. Update the boot loader
+   (such as grub, yaboot, or lilo) configuration files as necessary.
+5) Boot the system kernel with the boot parameter "crashkernel=Y@X",
+   where Y specifies how much memory to reserve for the dump-capture kernel
+   and X specifies the beginning of this reserved memory. For example,
+   "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
+   starting at physical address 0x01000000 for the dump-capture kernel.
+   On x86 and x86_64, use "crashkernel=64M@16M".
+   On ppc64, use "crashkernel=128M@32M".
+The dump-capture kernel
+-----------------------
-1) Download the upstream kexec-tools userspace package from
+1) Under "General setup," append "-kdump" to the current string in
-   http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz.
+   "Local version."
-   Apply the latest consolidated kdump patch on top of kexec-tools-1.101
+2) On x86, enable high memory support under "Processor type and
-   from http://lse.sourceforge.net/kdump/. This arrangment has been made
+   features":
-   till all the userspace patches supporting kdump are integrated with
-   upstream kexec-tools userspace.
+   CONFIG_HIGHMEM64G=y
+   or
-2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels.
+   CONFIG_HIGHMEM4G
-   Two kernels need to be built in order to get this feature working.
-   Following are the steps to properly configure the two kernels specific
+3) On x86 and x86_64, disable symmetric multi-processing support
-   to kexec and kdump features:
+   under "Processor type and features":
-  A) First kernel or regular kernel:
+   CONFIG_SMP=n
-  ----------------------------------
+   (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
-   a) Enable "kexec system call" feature (in Processor type and features).
+   when loading the dump-capture kernel, see section "Load the Dump-capture
-      CONFIG_KEXEC=y
+   Kernel".)
-   b) Enable "sysfs file system support" (in Pseudo filesystems).
-      CONFIG_SYSFS=y
+4) On ppc64, disable NUMA support and enable EMBEDDED support:
-   c) make
-   d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
+   CONFIG_NUMA=n
-      Use appropriate values for X and Y. Y denotes how much memory to reserve
+   CONFIG_EMBEDDED=y
-      for the second kernel, and X denotes at what physical address the
+   CONFIG_EEH=N for the dump-capture kernel
-      reserved memory section starts. For example: "crashkernel=64M@16M".
+5) Enable "kernel crash dumps" support under "Processor type and
+   features":
-  B) Second kernel or dump capture kernel:
-  ---------------------------------------
+   CONFIG_CRASH_DUMP=y
-   a) For i386 architecture enable Highmem support
-      CONFIG_HIGHMEM=y
+6) Use a suitable value for "Physical address where the kernel is
-   b) Enable "kernel crash dumps" feature (under "Processor type and features")
+   loaded" (under "Processor type and features"). This only appears when
-      CONFIG_CRASH_DUMP=y
+   "kernel crash dumps" is enabled. By default this value is 0x1000000
-   c) Make sure a suitable value for "Physical address where the kernel is
+   (16MB). It should be the same as X in the "crashkernel=Y@X" boot
-      loaded" (under "Processor type and features"). By default this value
+   parameter discussed above.
-      is 0x1000000 (16MB) and it should be same as X (See option d above),
-      e.g., 16 MB or 0x1000000.
+   On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000".
-      CONFIG_PHYSICAL_START=0x1000000
-   d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems").
+   On ppc64 the value is automatically set at 32MB when
-      CONFIG_PROC_VMCORE=y
+   CONFIG_CRASH_DUMP is set.
-3) After booting to regular kernel or first kernel, load the second kernel
+6) Optionally enable "/proc/vmcore support" under "Filesystems" ->
-   using the following command:
+   "Pseudo filesystems".
-   kexec -p <second-kernel> --args-linux --elf32-core-headers
+   CONFIG_PROC_VMCORE=y
-   --append="root=<root-dev> init 1 irqpoll maxcpus=1"
+   (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
-   Notes:
+7) Make and install the kernel and its modules. DO NOT add this kernel
-   ======
+   to the boot loader configuration files.
-     i) <second-kernel> has to be a vmlinux image ie uncompressed elf image.
-        bzImage will not work, as of now.
-    ii) --args-linux has to be speicfied as if kexec it loading an elf image,
+Load the Dump-capture Kernel
-        it needs to know that the arguments supplied are of linux type.
+============================
-   iii) By default ELF headers are stored in ELF64 format to support systems
-        with more than 4GB memory. Option --elf32-core-headers forces generation
+After booting to the system kernel, load the dump-capture kernel using
-        of ELF32 headers. The reason for this option being, as of now gdb can
+the following command:
-        not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32
-        headers can be used if one has non-PAE systems and hence memory less
+   kexec -p <dump-capture-kernel> \
-        than 4GB.
+   --initrd=<initrd-for-dump-capture-kernel> --args-linux \
-    iv) Specify "irqpoll" as command line parameter. This reduces driver
+   --append="root=<root-dev> init 1 irqpoll"
-         initialization failures in second kernel due to shared interrupts.
-     v) <root-dev> needs to be specified in a format corresponding to the root
-        device name in the output of mount command.
+Notes on loading the dump-capture kernel:
-    vi) If you have built the drivers required to mount root file system as
-        modules in <second-kernel>, then, specify
+* <dump-capture-kernel> must be a vmlinux image (that is, an
-        --initrd=<initrd-for-second-kernel>.
+  uncompressed ELF image). bzImage does not work at this time.
-   vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on
-        non-boot cpus, second kernel doesn't seem to be boot up all the cpus.
+* By default, the ELF headers are stored in ELF64 format to support
-        The other option is to always built the second kernel without SMP
+  systems with more than 4GB memory. The --elf32-core-headers option can
-        support ie CONFIG_SMP=n
+  be used to force the generation of ELF32 headers. This is necessary
+  because GDB currently cannot open vmcore files with ELF64 headers on
-4) After successfully loading the second kernel as above, if a panic occurs
+  32-bit systems. ELF32 headers can be used on non-PAE systems (that is,
-   system reboots into the second kernel. A module can be written to force
+  less than 4GB of memory).
-   the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing
-   purposes.
+* The "irqpoll" boot parameter reduces driver initialization failures
+  due to shared interrupts in the dump-capture kernel.
-5) Once the second kernel has booted, write out the dump file using
+* You must specify <root-dev> in the format corresponding to the root
+  device name in the output of mount command.
+* "init 1" boots the dump-capture kernel into single-user mode without
+  networking. If you want networking, use "init 3."
+Kernel Panic
+============
+After successfully loading the dump-capture kernel as previously
+described, the system will reboot into the dump-capture kernel if a
+system crash is triggered.  Trigger points are located in panic(),
+die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
+The following conditions will execute a crash trigger point:
+If a hard lockup is detected and "NMI watchdog" is configured, the system
+will boot into the dump-capture kernel ( die_nmi() ).
+If die() is called, and it happens to be a thread with pid 0 or 1, or die()
+is called inside interrupt context or die() is called and panic_on_oops is set,
+the system will boot into the dump-capture kernel.
+On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system system will boot into the dump-capture kernel.
+For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
+"echo c > /proc/sysrq-trigger or write a module to force the panic.
+Write Out the Dump File
+=======================
+After the dump-capture kernel is booted, write out the dump file with
+the following command:
   cp /proc/vmcore <dump-file>
-   Dump memory can also be accessed as a /dev/oldmem device for a linear/raw
+You can also access dumped memory as a /dev/oldmem device for a linear
-   view.  To create the device, type:
+and raw view. To create the device, use the following command:
-   mknod /dev/oldmem c 1 12
+    mknod /dev/oldmem c 1 12
-   Use "dd" with suitable options for count, bs and skip to access specific
+Use the dd command with suitable options for count, bs, and skip to
-   portions of the dump.
+access specific portions of the dump.
-   Entire memory:  dd if=/dev/oldmem of=oldmem.001
+To see the entire memory, use the following command:
+   dd if=/dev/oldmem of=oldmem.001
-ANALYSIS
+Analysis
 ========
-Limited analysis can be done using gdb on the dump file copied out of
-/proc/vmcore. Use vmlinux built with -g and run
-  gdb vmlinux <dump-file>
+Before analyzing the dump image, you should reboot into a stable kernel.
+You can do limited analysis using GDB on the dump file copied out of
+/proc/vmcore. Use the debug vmlinux built with -g and run the following
+command:
+   gdb vmlinux <dump-file>
-Stack trace for the task on processor 0, register display, memory display
+Stack trace for the task on processor 0, register display, and memory
-work fine.
+display work fine.
-Note: gdb cannot analyse core files generated in ELF64 format for i386.
+Note: GDB cannot analyze core files generated in ELF64 format for x86.
+On systems with a maximum of 4GB of memory, you can generate
+ELF32-format headers using the --elf32-core-headers kernel option on the
+dump kernel.
-Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site
+You can also use the Crash utility to analyze dump files in Kdump
-http://people.redhat.com/~anderson/ works well with kdump format.
+format. Crash is available on Dave Anderson's site at the following URL:
+   http://people.redhat.com/~anderson/
+To Do
+=====
-TODO
+1) Provide a kernel pages filtering mechanism, so core file size is not
-====
+   extreme on systems with huge memory banks.
-1) Provide a kernel pages filtering mechanism so that core file size is not
-   insane on systems having huge memory banks.
-2) Relocatable kernel can help in maintaining multiple kernels for crashdump
-   and same kernel as the first kernel can be used to capture the dump.
+2) Relocatable kernel can help in maintaining multiple kernels for
+   crash_dump, and the same kernel as the system kernel can be used to
+   capture the dump.
-CONTACT
+Contact
 =======
 Vivek Goyal (vgoyal@in.ibm.com)
 Maneesh Soni (maneesh@in.ibm.com)
+Trademark
+=========
+Linux is a trademark of Linus Torvalds in the United States, other
+countries, or both.
author	David Wilder <dwilder@us.ibm.com>	2006-06-25 08:47:55 -0400
committer	Linus Torvalds <torvalds@g5.osdl.org>	2006-06-25 13:01:08 -0400
commit	dc851a0fd2736e8dc3e90bd990cb911a0013da67 (patch)
tree	715f381d67be16d27fd656eb1dd8d5dd3a52c10a /Documentation
parent	8ea2c2ecfcc1f31eaba8d1995b2e734ba821806a (diff)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 212cf3c21abf..08bafa8c1caa 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt
@@ -1,155 +1,325 @@
1	Documentation for kdump - the kexec-based crash dumping solution	1	================================================================
		2	Documentation for Kdump - The kexec-based Crash Dumping Solution
2	================================================================	3	================================================================
3		4
4	DESIGN	5	This document includes overview, setup and installation, and analysis
5	======	6	information.
6		7
7	Kdump uses kexec to reboot to a second kernel whenever a dump needs to be	8	Overview
8	taken. This second kernel is booted with very little memory. The first kernel	9	========
9	reserves the section of memory that the second kernel uses. This ensures that
10	on-going DMA from the first kernel does not corrupt the second kernel.
11		10
12	All the necessary information about Core image is encoded in ELF format and	11	Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
13	stored in reserved area of memory before crash. Physical address of start of	12	dump of the system kernel's memory needs to be taken (for example, when
14	ELF header is passed to new kernel through command line parameter elfcorehdr=.	13	the system panics). The system kernel's memory image is preserved across
		14	the reboot and is accessible to the dump-capture kernel.
15		15
16	On i386, the first 640 KB of physical memory is needed to boot, irrespective	16	You can use common Linux commands, such as cp and scp, to copy the
17	of where the kernel loads. Hence, this region is backed up by kexec just before	17	memory image to a dump file on the local disk, or across the network to
18	rebooting into the new kernel.	18	a remote system.
19		19
20	In the second kernel, "old memory" can be accessed in two ways.	20	Kdump and kexec are currently supported on the x86, x86_64, and ppc64
		21	architectures.
21		22
22	- The first one is through a /dev/oldmem device interface. A capture utility	23	When the system kernel boots, it reserves a small section of memory for
23	can read the device file and write out the memory in raw format. This is raw	24	the dump-capture kernel. This ensures that ongoing Direct Memory Access
24	dump of memory and analysis/capture tool should be intelligent enough to	25	(DMA) from the system kernel does not corrupt the dump-capture kernel.
25	determine where to look for the right information. ELF headers (elfcorehdr=)	26	The kexec -p command loads the dump-capture kernel into this reserved
26	can become handy here.	27	memory.
27		28
28	- The second interface is through /proc/vmcore. This exports the dump as an ELF	29	On x86 machines, the first 640 KB of physical memory is needed to boot,
29	format file which can be written out using any file copy command	30	regardless of where the kernel loads. Therefore, kexec backs up this
30	(cp, scp, etc). Further, gdb can be used to perform limited debugging on	31	region just before rebooting into the dump-capture kernel.
31	the dump file. This method ensures methods ensure that there is correct
32	ordering of the dump pages (corresponding to the first 640 KB that has been
33	relocated).
34		32
35	SETUP	33	All of the necessary information about the system kernel's core image is
36	=====	34	encoded in the ELF format, and stored in a reserved area of memory
		35	before a crash. The physical address of the start of the ELF header is
		36	passed to the dump-capture kernel through the elfcorehdr= boot
		37	parameter.
		38
		39	With the dump-capture kernel, you can access the memory image, or "old
		40	memory," in two ways:
		41
		42	- Through a /dev/oldmem device interface. A capture utility can read the
		43	device file and write out the memory in raw format. This is a raw dump
		44	of memory. Analysis and capture tools must be intelligent enough to
		45	determine where to look for the right information.
		46
		47	- Through /proc/vmcore. This exports the dump as an ELF-format file that
		48	you can write out using file copy commands such as cp or scp. Further,
		49	you can use analysis tools such as the GNU Debugger (GDB) and the Crash
		50	tool to debug the dump file. This method ensures that the dump pages are
		51	correctly ordered.
		52
		53
		54	Setup and Installation
		55	======================
		56
		57	Install kexec-tools and the Kdump patch
		58	---------------------------------------
		59
		60	1) Login as the root user.
		61
		62	2) Download the kexec-tools user-space package from the following URL:
		63
		64	http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
		65
		66	3) Unpack the tarball with the tar command, as follows:
		67
		68	tar xvpzf kexec-tools-1.101.tar.gz
		69
		70	4) Download the latest consolidated Kdump patch from the following URL:
		71
		72	http://lse.sourceforge.net/kdump/
		73
		74	(This location is being used until all the user-space Kdump patches
		75	are integrated with the kexec-tools package.)
		76
		77	5) Change to the kexec-tools-1.101 directory, as follows:
		78
		79	cd kexec-tools-1.101
		80
		81	6) Apply the consolidated patch to the kexec-tools-1.101 source tree
		82	with the patch command, as follows. (Modify the path to the downloaded
		83	patch as necessary.)
		84
		85	patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch
		86
		87	7) Configure the package, as follows:
		88
		89	./configure
		90
		91	8) Compile the package, as follows:
		92
		93	make
		94
		95	9) Install the package, as follows:
		96
		97	make install
		98
		99
		100	Download and build the system and dump-capture kernels
		101	------------------------------------------------------
		102
		103	Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer)
		104	from http://www.kernel.org. Two kernels must be built: a system kernel
		105	and a dump-capture kernel. Use the following steps to configure these
		106	kernels with the necessary kexec and Kdump features:
		107
		108	System kernel
		109	-------------
		110
		111	1) Enable "kexec system call" in "Processor type and features."
		112
		113	CONFIG_KEXEC=y
		114
		115	2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
		116	filesystems." This is usually enabled by default.
		117
		118	CONFIG_SYSFS=y
		119
		120	Note that "sysfs file system support" might not appear in the "Pseudo
		121	filesystems" menu if "Configure standard kernel features (for small
		122	systems)" is not enabled in "General Setup." In this case, check the
		123	.config file itself to ensure that sysfs is turned on, as follows:
		124
		125	grep 'CONFIG_SYSFS' .config
		126
		127	3) Enable "Compile the kernel with debug info" in "Kernel hacking."
		128
		129	CONFIG_DEBUG_INFO=Y
		130
		131	This causes the kernel to be built with debug symbols. The dump
		132	analysis tools require a vmlinux with debug symbols in order to read
		133	and analyze a dump file.
		134
		135	4) Make and install the kernel and its modules. Update the boot loader
		136	(such as grub, yaboot, or lilo) configuration files as necessary.
		137
		138	5) Boot the system kernel with the boot parameter "crashkernel=Y@X",
		139	where Y specifies how much memory to reserve for the dump-capture kernel
		140	and X specifies the beginning of this reserved memory. For example,
		141	"crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
		142	starting at physical address 0x01000000 for the dump-capture kernel.
		143
		144	On x86 and x86_64, use "crashkernel=64M@16M".
		145
		146	On ppc64, use "crashkernel=128M@32M".
		147
		148
		149	The dump-capture kernel
		150	-----------------------
37		151
38	1) Download the upstream kexec-tools userspace package from	152	1) Under "General setup," append "-kdump" to the current string in
39	http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz.	153	"Local version."
40		154
41	Apply the latest consolidated kdump patch on top of kexec-tools-1.101	155	2) On x86, enable high memory support under "Processor type and
42	from http://lse.sourceforge.net/kdump/. This arrangment has been made	156	features":
43	till all the userspace patches supporting kdump are integrated with	157
44	upstream kexec-tools userspace.	158	CONFIG_HIGHMEM64G=y
45		159	or
46	2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels.	160	CONFIG_HIGHMEM4G
47	Two kernels need to be built in order to get this feature working.	161
48	Following are the steps to properly configure the two kernels specific	162	3) On x86 and x86_64, disable symmetric multi-processing support
49	to kexec and kdump features:	163	under "Processor type and features":
50		164
51	A) First kernel or regular kernel:	165	CONFIG_SMP=n
52	----------------------------------	166	(If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
53	a) Enable "kexec system call" feature (in Processor type and features).	167	when loading the dump-capture kernel, see section "Load the Dump-capture
54	CONFIG_KEXEC=y	168	Kernel".)
55	b) Enable "sysfs file system support" (in Pseudo filesystems).	169
56	CONFIG_SYSFS=y	170	4) On ppc64, disable NUMA support and enable EMBEDDED support:
57	c) make	171
58	d) Boot into first kernel with the command line parameter "crashkernel=Y@X".	172	CONFIG_NUMA=n
59	Use appropriate values for X and Y. Y denotes how much memory to reserve	173	CONFIG_EMBEDDED=y
60	for the second kernel, and X denotes at what physical address the	174	CONFIG_EEH=N for the dump-capture kernel
61	reserved memory section starts. For example: "crashkernel=64M@16M".	175
62		176	5) Enable "kernel crash dumps" support under "Processor type and
63		177	features":
64	B) Second kernel or dump capture kernel:	178
65	---------------------------------------	179	CONFIG_CRASH_DUMP=y
66	a) For i386 architecture enable Highmem support	180
67	CONFIG_HIGHMEM=y	181	6) Use a suitable value for "Physical address where the kernel is
68	b) Enable "kernel crash dumps" feature (under "Processor type and features")	182	loaded" (under "Processor type and features"). This only appears when
69	CONFIG_CRASH_DUMP=y	183	"kernel crash dumps" is enabled. By default this value is 0x1000000
70	c) Make sure a suitable value for "Physical address where the kernel is	184	(16MB). It should be the same as X in the "crashkernel=Y@X" boot
71	loaded" (under "Processor type and features"). By default this value	185	parameter discussed above.
72	is 0x1000000 (16MB) and it should be same as X (See option d above),	186
73	e.g., 16 MB or 0x1000000.	187	On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000".
74	CONFIG_PHYSICAL_START=0x1000000	188
75	d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems").	189	On ppc64 the value is automatically set at 32MB when
76	CONFIG_PROC_VMCORE=y	190	CONFIG_CRASH_DUMP is set.
77		191
78	3) After booting to regular kernel or first kernel, load the second kernel	192	6) Optionally enable "/proc/vmcore support" under "Filesystems" ->
79	using the following command:	193	"Pseudo filesystems".
80		194
81	kexec -p <second-kernel> --args-linux --elf32-core-headers	195	CONFIG_PROC_VMCORE=y
82	--append="root=<root-dev> init 1 irqpoll maxcpus=1"	196	(CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
83		197
84	Notes:	198	7) Make and install the kernel and its modules. DO NOT add this kernel
85	======	199	to the boot loader configuration files.
86	i) <second-kernel> has to be a vmlinux image ie uncompressed elf image.	200
87	bzImage will not work, as of now.	201
88	ii) --args-linux has to be speicfied as if kexec it loading an elf image,	202	Load the Dump-capture Kernel
89	it needs to know that the arguments supplied are of linux type.	203	============================
90	iii) By default ELF headers are stored in ELF64 format to support systems	204
91	with more than 4GB memory. Option --elf32-core-headers forces generation	205	After booting to the system kernel, load the dump-capture kernel using
92	of ELF32 headers. The reason for this option being, as of now gdb can	206	the following command:
93	not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32	207
94	headers can be used if one has non-PAE systems and hence memory less	208	kexec -p <dump-capture-kernel> \
95	than 4GB.	209	--initrd=<initrd-for-dump-capture-kernel> --args-linux \
96	iv) Specify "irqpoll" as command line parameter. This reduces driver	210	--append="root=<root-dev> init 1 irqpoll"
97	initialization failures in second kernel due to shared interrupts.	211
98	v) <root-dev> needs to be specified in a format corresponding to the root	212
99	device name in the output of mount command.	213	Notes on loading the dump-capture kernel:
100	vi) If you have built the drivers required to mount root file system as	214
101	modules in <second-kernel>, then, specify	215	* <dump-capture-kernel> must be a vmlinux image (that is, an
102	--initrd=<initrd-for-second-kernel>.	216	uncompressed ELF image). bzImage does not work at this time.
103	vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on	217
104	non-boot cpus, second kernel doesn't seem to be boot up all the cpus.	218	* By default, the ELF headers are stored in ELF64 format to support
105	The other option is to always built the second kernel without SMP	219	systems with more than 4GB memory. The --elf32-core-headers option can
106	support ie CONFIG_SMP=n	220	be used to force the generation of ELF32 headers. This is necessary
107		221	because GDB currently cannot open vmcore files with ELF64 headers on
108	4) After successfully loading the second kernel as above, if a panic occurs	222	32-bit systems. ELF32 headers can be used on non-PAE systems (that is,
109	system reboots into the second kernel. A module can be written to force	223	less than 4GB of memory).
110	the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing	224
111	purposes.	225	* The "irqpoll" boot parameter reduces driver initialization failures
112		226	due to shared interrupts in the dump-capture kernel.
113	5) Once the second kernel has booted, write out the dump file using	227
		228	* You must specify <root-dev> in the format corresponding to the root
		229	device name in the output of mount command.
		230
		231	* "init 1" boots the dump-capture kernel into single-user mode without
		232	networking. If you want networking, use "init 3."
		233
		234
		235	Kernel Panic
		236	============
		237
		238	After successfully loading the dump-capture kernel as previously
		239	described, the system will reboot into the dump-capture kernel if a
		240	system crash is triggered. Trigger points are located in panic(),
		241	die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
		242
		243	The following conditions will execute a crash trigger point:
		244
		245	If a hard lockup is detected and "NMI watchdog" is configured, the system
		246	will boot into the dump-capture kernel ( die_nmi() ).
		247
		248	If die() is called, and it happens to be a thread with pid 0 or 1, or die()
		249	is called inside interrupt context or die() is called and panic_on_oops is set,
		250	the system will boot into the dump-capture kernel.
		251
		252	On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system system will boot into the dump-capture kernel.
		253
		254	For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
		255	"echo c > /proc/sysrq-trigger or write a module to force the panic.
		256
		257	Write Out the Dump File
		258	=======================
		259
		260	After the dump-capture kernel is booted, write out the dump file with
		261	the following command:
114		262
115	cp /proc/vmcore <dump-file>	263	cp /proc/vmcore <dump-file>
116		264
117	Dump memory can also be accessed as a /dev/oldmem device for a linear/raw	265	You can also access dumped memory as a /dev/oldmem device for a linear
118	view. To create the device, type:	266	and raw view. To create the device, use the following command:
119		267
120	mknod /dev/oldmem c 1 12	268	mknod /dev/oldmem c 1 12
121		269
122	Use "dd" with suitable options for count, bs and skip to access specific	270	Use the dd command with suitable options for count, bs, and skip to
123	portions of the dump.	271	access specific portions of the dump.
124		272
125	Entire memory: dd if=/dev/oldmem of=oldmem.001	273	To see the entire memory, use the following command:
126		274
		275	dd if=/dev/oldmem of=oldmem.001
127		276
128	ANALYSIS	277
		278	Analysis
129	========	279	========
130	Limited analysis can be done using gdb on the dump file copied out of
131	/proc/vmcore. Use vmlinux built with -g and run
132		280
133	gdb vmlinux <dump-file>	281	Before analyzing the dump image, you should reboot into a stable kernel.
		282
		283	You can do limited analysis using GDB on the dump file copied out of
		284	/proc/vmcore. Use the debug vmlinux built with -g and run the following
		285	command:
		286
		287	gdb vmlinux <dump-file>
134		288
135	Stack trace for the task on processor 0, register display, memory display	289	Stack trace for the task on processor 0, register display, and memory
136	work fine.	290	display work fine.
137		291
138	Note: gdb cannot analyse core files generated in ELF64 format for i386.	292	Note: GDB cannot analyze core files generated in ELF64 format for x86.
		293	On systems with a maximum of 4GB of memory, you can generate
		294	ELF32-format headers using the --elf32-core-headers kernel option on the
		295	dump kernel.
139		296
140	Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site	297	You can also use the Crash utility to analyze dump files in Kdump
141	http://people.redhat.com/~anderson/ works well with kdump format.	298	format. Crash is available on Dave Anderson's site at the following URL:
142		299
		300	http://people.redhat.com/~anderson/
		301
		302
		303	To Do
		304	=====
143		305
144	TODO	306	1) Provide a kernel pages filtering mechanism, so core file size is not
145	====	307	extreme on systems with huge memory banks.
146	1) Provide a kernel pages filtering mechanism so that core file size is not
147	insane on systems having huge memory banks.
148	2) Relocatable kernel can help in maintaining multiple kernels for crashdump
149	and same kernel as the first kernel can be used to capture the dump.
150		308
		309	2) Relocatable kernel can help in maintaining multiple kernels for
		310	crash_dump, and the same kernel as the system kernel can be used to
		311	capture the dump.
151		312
152	CONTACT	313
		314	Contact
153	=======	315	=======
		316
154	Vivek Goyal (vgoyal@in.ibm.com)	317	Vivek Goyal (vgoyal@in.ibm.com)
155	Maneesh Soni (maneesh@in.ibm.com)	318	Maneesh Soni (maneesh@in.ibm.com)
		319
		320
		321	Trademark
		322	=========
		323
		324	Linux is a trademark of Linus Torvalds in the United States, other
		325	countries, or both.