diff options
| author | Jonathan Herman <hermanjl@cs.unc.edu> | 2013-01-22 10:38:37 -0500 |
|---|---|---|
| committer | Jonathan Herman <hermanjl@cs.unc.edu> | 2013-01-22 10:38:37 -0500 |
| commit | fcc9d2e5a6c89d22b8b773a64fb4ad21ac318446 (patch) | |
| tree | a57612d1888735a2ec7972891b68c1ac5ec8faea /Documentation/powerpc | |
| parent | 8dea78da5cee153b8af9c07a2745f6c55057fe12 (diff) | |
Diffstat (limited to 'Documentation/powerpc')
| -rw-r--r-- | Documentation/powerpc/phyp-assisted-dump.txt | 127 |
1 files changed, 127 insertions, 0 deletions
diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt new file mode 100644 index 00000000000..ad340205d96 --- /dev/null +++ b/Documentation/powerpc/phyp-assisted-dump.txt | |||
| @@ -0,0 +1,127 @@ | |||
| 1 | |||
| 2 | Hypervisor-Assisted Dump | ||
| 3 | ------------------------ | ||
| 4 | November 2007 | ||
| 5 | |||
| 6 | The goal of hypervisor-assisted dump is to enable the dump of | ||
| 7 | a crashed system, and to do so from a fully-reset system, and | ||
| 8 | to minimize the total elapsed time until the system is back | ||
| 9 | in production use. | ||
| 10 | |||
| 11 | As compared to kdump or other strategies, hypervisor-assisted | ||
| 12 | dump offers several strong, practical advantages: | ||
| 13 | |||
| 14 | -- Unlike kdump, the system has been reset, and loaded | ||
| 15 | with a fresh copy of the kernel. In particular, | ||
| 16 | PCI and I/O devices have been reinitialized and are | ||
| 17 | in a clean, consistent state. | ||
| 18 | -- As the dump is performed, the dumped memory becomes | ||
| 19 | immediately available to the system for normal use. | ||
| 20 | -- After the dump is completed, no further reboots are | ||
| 21 | required; the system will be fully usable, and running | ||
| 22 | in its normal, production mode on its normal kernel. | ||
| 23 | |||
| 24 | The above can only be accomplished by coordination with, | ||
| 25 | and assistance from the hypervisor. The procedure is | ||
| 26 | as follows: | ||
| 27 | |||
| 28 | -- When a system crashes, the hypervisor will save | ||
| 29 | the low 256MB of RAM to a previously registered | ||
| 30 | save region. It will also save system state, system | ||
| 31 | registers, and hardware PTE's. | ||
| 32 | |||
| 33 | -- After the low 256MB area has been saved, the | ||
| 34 | hypervisor will reset PCI and other hardware state. | ||
| 35 | It will *not* clear RAM. It will then launch the | ||
| 36 | bootloader, as normal. | ||
| 37 | |||
| 38 | -- The freshly booted kernel will notice that there | ||
| 39 | is a new node (ibm,dump-kernel) in the device tree, | ||
| 40 | indicating that there is crash data available from | ||
| 41 | a previous boot. It will boot into only 256MB of RAM, | ||
| 42 | reserving the rest of system memory. | ||
| 43 | |||
| 44 | -- Userspace tools will parse /sys/kernel/release_region | ||
| 45 | and read /proc/vmcore to obtain the contents of memory, | ||
| 46 | which holds the previous crashed kernel. The userspace | ||
| 47 | tools may copy this info to disk, or network, nas, san, | ||
| 48 | iscsi, etc. as desired. | ||
| 49 | |||
| 50 | For Example: the values in /sys/kernel/release-region | ||
| 51 | would look something like this (address-range pairs). | ||
| 52 | CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / | ||
| 53 | DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A | ||
| 54 | |||
| 55 | -- As the userspace tools complete saving a portion of | ||
| 56 | dump, they echo an offset and size to | ||
| 57 | /sys/kernel/release_region to release the reserved | ||
| 58 | memory back to general use. | ||
| 59 | |||
| 60 | An example of this is: | ||
| 61 | "echo 0x40000000 0x10000000 > /sys/kernel/release_region" | ||
| 62 | which will release 256MB at the 1GB boundary. | ||
| 63 | |||
| 64 | Please note that the hypervisor-assisted dump feature | ||
| 65 | is only available on Power6-based systems with recent | ||
| 66 | firmware versions. | ||
| 67 | |||
| 68 | Implementation details: | ||
| 69 | ---------------------- | ||
| 70 | |||
| 71 | During boot, a check is made to see if firmware supports | ||
| 72 | this feature on this particular machine. If it does, then | ||
| 73 | we check to see if a active dump is waiting for us. If yes | ||
| 74 | then everything but 256 MB of RAM is reserved during early | ||
| 75 | boot. This area is released once we collect a dump from user | ||
| 76 | land scripts that are run. If there is dump data, then | ||
| 77 | the /sys/kernel/release_region file is created, and | ||
| 78 | the reserved memory is held. | ||
| 79 | |||
| 80 | If there is no waiting dump data, then only the highest | ||
| 81 | 256MB of the ram is reserved as a scratch area. This area | ||
| 82 | is *not* released: this region will be kept permanently | ||
| 83 | reserved, so that it can act as a receptacle for a copy | ||
| 84 | of the low 256MB in the case a crash does occur. See, | ||
| 85 | however, "open issues" below, as to whether | ||
| 86 | such a reserved region is really needed. | ||
| 87 | |||
| 88 | Currently the dump will be copied from /proc/vmcore to a | ||
| 89 | a new file upon user intervention. The starting address | ||
| 90 | to be read and the range for each data point in provided | ||
| 91 | in /sys/kernel/release_region. | ||
| 92 | |||
| 93 | The tools to examine the dump will be same as the ones | ||
| 94 | used for kdump. | ||
| 95 | |||
| 96 | General notes: | ||
| 97 | -------------- | ||
| 98 | Security: please note that there are potential security issues | ||
| 99 | with any sort of dump mechanism. In particular, plaintext | ||
| 100 | (unencrypted) data, and possibly passwords, may be present in | ||
| 101 | the dump data. Userspace tools must take adequate precautions to | ||
| 102 | preserve security. | ||
| 103 | |||
| 104 | Open issues/ToDo: | ||
| 105 | ------------ | ||
| 106 | o The various code paths that tell the hypervisor that a crash | ||
| 107 | occurred, vs. it simply being a normal reboot, should be | ||
| 108 | reviewed, and possibly clarified/fixed. | ||
| 109 | |||
| 110 | o Instead of using /sys/kernel, should there be a /sys/dump | ||
| 111 | instead? There is a dump_subsys being created by the s390 code, | ||
| 112 | perhaps the pseries code should use a similar layout as well. | ||
| 113 | |||
| 114 | o Is reserving a 256MB region really required? The goal of | ||
| 115 | reserving a 256MB scratch area is to make sure that no | ||
| 116 | important crash data is clobbered when the hypervisor | ||
| 117 | save low mem to the scratch area. But, if one could assure | ||
| 118 | that nothing important is located in some 256MB area, then | ||
| 119 | it would not need to be reserved. Something that can be | ||
| 120 | improved in subsequent versions. | ||
| 121 | |||
| 122 | o Still working the kdump team to integrate this with kdump, | ||
| 123 | some work remains but this would not affect the current | ||
| 124 | patches. | ||
| 125 | |||
| 126 | o Still need to write a shell script, to copy the dump away. | ||
| 127 | Currently I am parsing it manually. | ||
