diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-03-21 21:55:10 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-03-21 21:55:10 -0400 |
commit | 5375871d432ae9fc581014ac117b96aaee3cd0c7 (patch) | |
tree | be98e8255b0f927fb920fb532a598b93fa140dbe /Documentation | |
parent | b57cb7231b2ce52d3dda14a7b417ae125fb2eb97 (diff) | |
parent | dfbc2d75c1bd47c3186fa91f1655ea2f3825b0ec (diff) |
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc merge from Benjamin Herrenschmidt:
"Here's the powerpc batch for this merge window. It is going to be a
bit more nasty than usual as in touching things outside of
arch/powerpc mostly due to the big iSeriesectomy :-) We finally got
rid of the bugger (legacy iSeries support) which was a PITA to
maintain and that nobody really used anymore.
Here are some of the highlights:
- Legacy iSeries is gone. Thanks Stephen ! There's still some bits
and pieces remaining if you do a grep -ir series arch/powerpc but
they are harmless and will be removed in the next few weeks
hopefully.
- The 'fadump' functionality (Firmware Assisted Dump) replaces the
previous (equivalent) "pHyp assisted dump"... it's a rewrite of a
mechanism to get the hypervisor to do crash dumps on pSeries, the
new implementation hopefully being much more reliable. Thanks
Mahesh Salgaonkar.
- The "EEH" code (pSeries PCI error handling & recovery) got a big
spring cleaning, motivated by the need to be able to implement a
new backend for it on top of some new different type of firwmare.
The work isn't complete yet, but a good chunk of the cleanups is
there. Note that this adds a field to struct device_node which is
not very nice and which Grant objects to. I will have a patch soon
that moves that to a powerpc private data structure (hopefully
before rc1) and we'll improve things further later on (hopefully
getting rid of the need for that pointer completely). Thanks Gavin
Shan.
- I dug into our exception & interrupt handling code to improve the
way we do lazy interrupt handling (and make it work properly with
"edge" triggered interrupt sources), and while at it found & fixed
a wagon of issues in those areas, including adding support for page
fault retry & fatal signals on page faults.
- Your usual random batch of small fixes & updates, including a bunch
of new embedded boards, both Freescale and APM based ones, etc..."
I fixed up some conflicts with the generalized irq-domain changes from
Grant Likely, hopefully correctly.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits)
powerpc/ps3: Do not adjust the wrapper load address
powerpc: Remove the rest of the legacy iSeries include files
powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
init: Remove CONFIG_PPC_ISERIES
powerpc: Remove FW_FEATURE ISERIES from arch code
tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable
powerpc/spufs: Fix double unlocks
powerpc/5200: convert mpc5200 to use of_platform_populate()
powerpc/mpc5200: add options to mpc5200_defconfig
powerpc/mpc52xx: add a4m072 board support
powerpc/mpc5200: update mpc5200_defconfig to fit for charon board
Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup
powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board
powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
MAINTAINERS: Update PowerPC 4xx tree
powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
powerpc: document the FSL MPIC message register binding
powerpc: add support for MPIC message register API
powerpc/fsl: Added aliased MSIIR register address to MSI node in dts
powerpc/85xx: mpc8548cds - add 36-bit dts
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt | 63 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/powerpc/fsl/mpic.txt | 22 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt | 6 | ||||
-rw-r--r-- | Documentation/powerpc/firmware-assisted-dump.txt | 270 | ||||
-rw-r--r-- | Documentation/powerpc/mpc52xx.txt | 12 | ||||
-rw-r--r-- | Documentation/powerpc/phyp-assisted-dump.txt | 127 |
6 files changed, 364 insertions, 136 deletions
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt new file mode 100644 index 000000000000..bc8ded641ab6 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt | |||
@@ -0,0 +1,63 @@ | |||
1 | * FSL MPIC Message Registers | ||
2 | |||
3 | This binding specifies what properties must be available in the device tree | ||
4 | representation of the message register blocks found in some FSL MPIC | ||
5 | implementations. | ||
6 | |||
7 | Required properties: | ||
8 | |||
9 | - compatible: Specifies the compatibility list for the message register | ||
10 | block. The type shall be <string-list> and the value shall be of the form | ||
11 | "fsl,mpic-v<version>-msgr", where <version> is the version number of | ||
12 | the MPIC containing the message registers. | ||
13 | |||
14 | - reg: Specifies the base physical address(s) and size(s) of the | ||
15 | message register block's addressable register space. The type shall be | ||
16 | <prop-encoded-array>. | ||
17 | |||
18 | - interrupts: Specifies a list of interrupt-specifiers which are available | ||
19 | for receiving interrupts. Interrupt-specifier consists of two cells: first | ||
20 | cell is interrupt-number and second cell is level-sense. The type shall be | ||
21 | <prop-encoded-array>. | ||
22 | |||
23 | Optional properties: | ||
24 | |||
25 | - mpic-msgr-receive-mask: Specifies what registers in the containing block | ||
26 | are allowed to receive interrupts. The value is a bit mask where a set | ||
27 | bit at bit 'n' indicates that message register 'n' can receive interrupts. | ||
28 | Note that "bit 'n'" is numbered from LSB for PPC hardware. The type shall | ||
29 | be <u32>. If not present, then all of the message registers in the block | ||
30 | are available. | ||
31 | |||
32 | Aliases: | ||
33 | |||
34 | An alias should be created for every message register block. They are not | ||
35 | required, though. However, a particular implementation of this binding | ||
36 | may require aliases to be present. Aliases are of the form | ||
37 | 'mpic-msgr-block<n>', where <n> is an integer specifying the block's number. | ||
38 | Numbers shall start at 0. | ||
39 | |||
40 | Example: | ||
41 | |||
42 | aliases { | ||
43 | mpic-msgr-block0 = &mpic_msgr_block0; | ||
44 | mpic-msgr-block1 = &mpic_msgr_block1; | ||
45 | }; | ||
46 | |||
47 | mpic_msgr_block0: mpic-msgr-block@41400 { | ||
48 | compatible = "fsl,mpic-v3.1-msgr"; | ||
49 | reg = <0x41400 0x200>; | ||
50 | // Message registers 0 and 2 in this block can receive interrupts on | ||
51 | // sources 0xb0 and 0xb2, respectively. | ||
52 | interrupts = <0xb0 2 0xb2 2>; | ||
53 | mpic-msgr-receive-mask = <0x5>; | ||
54 | }; | ||
55 | |||
56 | mpic_msgr_block1: mpic-msgr-block@42400 { | ||
57 | compatible = "fsl,mpic-v3.1-msgr"; | ||
58 | reg = <0x42400 0x200>; | ||
59 | // Message registers 0 and 2 in this block can receive interrupts on | ||
60 | // sources 0xb4 and 0xb6, respectively. | ||
61 | interrupts = <0xb4 2 0xb6 2>; | ||
62 | mpic-msgr-receive-mask = <0x5>; | ||
63 | }; | ||
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt index 2cf38bd841fd..dc5744636a57 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt | |||
@@ -56,7 +56,27 @@ PROPERTIES | |||
56 | to the client. The presence of this property also mandates | 56 | to the client. The presence of this property also mandates |
57 | that any initialization related to interrupt sources shall | 57 | that any initialization related to interrupt sources shall |
58 | be limited to sources explicitly referenced in the device tree. | 58 | be limited to sources explicitly referenced in the device tree. |
59 | 59 | ||
60 | - big-endian | ||
61 | Usage: optional | ||
62 | Value type: <empty> | ||
63 | If present the MPIC will be assumed to be big-endian. Some | ||
64 | device-trees omit this property on MPIC nodes even when the MPIC is | ||
65 | in fact big-endian, so certain boards override this property. | ||
66 | |||
67 | - single-cpu-affinity | ||
68 | Usage: optional | ||
69 | Value type: <empty> | ||
70 | If present the MPIC will be assumed to only be able to route | ||
71 | non-IPI interrupts to a single CPU at a time (EG: Freescale MPIC). | ||
72 | |||
73 | - last-interrupt-source | ||
74 | Usage: optional | ||
75 | Value type: <u32> | ||
76 | Some MPICs do not correctly report the number of hardware sources | ||
77 | in the global feature registers. If specified, this field will | ||
78 | override the value read from MPIC_GREG_FEATURE_LAST_SRC. | ||
79 | |||
60 | INTERRUPT SPECIFIER DEFINITION | 80 | INTERRUPT SPECIFIER DEFINITION |
61 | 81 | ||
62 | Interrupt specifiers consists of 4 cells encoded as | 82 | Interrupt specifiers consists of 4 cells encoded as |
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt b/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt index 5d586e1ccaf5..5693877ab377 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt | |||
@@ -6,8 +6,10 @@ Required properties: | |||
6 | etc.) and the second is "fsl,mpic-msi" or "fsl,ipic-msi" depending on | 6 | etc.) and the second is "fsl,mpic-msi" or "fsl,ipic-msi" depending on |
7 | the parent type. | 7 | the parent type. |
8 | 8 | ||
9 | - reg : should contain the address and the length of the shared message | 9 | - reg : It may contain one or two regions. The first region should contain |
10 | interrupt register set. | 10 | the address and the length of the shared message interrupt register set. |
11 | The second region should contain the address of aliased MSIIR register for | ||
12 | platforms that have such an alias. | ||
11 | 13 | ||
12 | - msi-available-ranges: use <start count> style section to define which | 14 | - msi-available-ranges: use <start count> style section to define which |
13 | msi interrupt can be used in the 256 msi interrupts. This property is | 15 | msi interrupt can be used in the 256 msi interrupts. This property is |
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt new file mode 100644 index 000000000000..3007bc98af28 --- /dev/null +++ b/Documentation/powerpc/firmware-assisted-dump.txt | |||
@@ -0,0 +1,270 @@ | |||
1 | |||
2 | Firmware-Assisted Dump | ||
3 | ------------------------ | ||
4 | July 2011 | ||
5 | |||
6 | The goal of firmware-assisted dump is to enable the dump of | ||
7 | a crashed system, and to do so from a fully-reset system, and | ||
8 | to minimize the total elapsed time until the system is back | ||
9 | in production use. | ||
10 | |||
11 | - Firmware assisted dump (fadump) infrastructure is intended to replace | ||
12 | the existing phyp assisted dump. | ||
13 | - Fadump uses the same firmware interfaces and memory reservation model | ||
14 | as phyp assisted dump. | ||
15 | - Unlike phyp dump, fadump exports the memory dump through /proc/vmcore | ||
16 | in the ELF format in the same way as kdump. This helps us reuse the | ||
17 | kdump infrastructure for dump capture and filtering. | ||
18 | - Unlike phyp dump, userspace tool does not need to refer any sysfs | ||
19 | interface while reading /proc/vmcore. | ||
20 | - Unlike phyp dump, fadump allows user to release all the memory reserved | ||
21 | for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem. | ||
22 | - Once enabled through kernel boot parameter, fadump can be | ||
23 | started/stopped through /sys/kernel/fadump_registered interface (see | ||
24 | sysfs files section below) and can be easily integrated with kdump | ||
25 | service start/stop init scripts. | ||
26 | |||
27 | Comparing with kdump or other strategies, firmware-assisted | ||
28 | dump offers several strong, practical advantages: | ||
29 | |||
30 | -- Unlike kdump, the system has been reset, and loaded | ||
31 | with a fresh copy of the kernel. In particular, | ||
32 | PCI and I/O devices have been reinitialized and are | ||
33 | in a clean, consistent state. | ||
34 | -- Once the dump is copied out, the memory that held the dump | ||
35 | is immediately available to the running kernel. And therefore, | ||
36 | unlike kdump, fadump doesn't need a 2nd reboot to get back | ||
37 | the system to the production configuration. | ||
38 | |||
39 | The above can only be accomplished by coordination with, | ||
40 | and assistance from the Power firmware. The procedure is | ||
41 | as follows: | ||
42 | |||
43 | -- The first kernel registers the sections of memory with the | ||
44 | Power firmware for dump preservation during OS initialization. | ||
45 | These registered sections of memory are reserved by the first | ||
46 | kernel during early boot. | ||
47 | |||
48 | -- When a system crashes, the Power firmware will save | ||
49 | the low memory (boot memory of size larger of 5% of system RAM | ||
50 | or 256MB) of RAM to the previous registered region. It will | ||
51 | also save system registers, and hardware PTE's. | ||
52 | |||
53 | NOTE: The term 'boot memory' means size of the low memory chunk | ||
54 | that is required for a kernel to boot successfully when | ||
55 | booted with restricted memory. By default, the boot memory | ||
56 | size will be the larger of 5% of system RAM or 256MB. | ||
57 | Alternatively, user can also specify boot memory size | ||
58 | through boot parameter 'fadump_reserve_mem=' which will | ||
59 | override the default calculated size. Use this option | ||
60 | if default boot memory size is not sufficient for second | ||
61 | kernel to boot successfully. | ||
62 | |||
63 | -- After the low memory (boot memory) area has been saved, the | ||
64 | firmware will reset PCI and other hardware state. It will | ||
65 | *not* clear the RAM. It will then launch the bootloader, as | ||
66 | normal. | ||
67 | |||
68 | -- The freshly booted kernel will notice that there is a new | ||
69 | node (ibm,dump-kernel) in the device tree, indicating that | ||
70 | there is crash data available from a previous boot. During | ||
71 | the early boot OS will reserve rest of the memory above | ||
72 | boot memory size effectively booting with restricted memory | ||
73 | size. This will make sure that the second kernel will not | ||
74 | touch any of the dump memory area. | ||
75 | |||
76 | -- User-space tools will read /proc/vmcore to obtain the contents | ||
77 | of memory, which holds the previous crashed kernel dump in ELF | ||
78 | format. The userspace tools may copy this info to disk, or | ||
79 | network, nas, san, iscsi, etc. as desired. | ||
80 | |||
81 | -- Once the userspace tool is done saving dump, it will echo | ||
82 | '1' to /sys/kernel/fadump_release_mem to release the reserved | ||
83 | memory back to general use, except the memory required for | ||
84 | next firmware-assisted dump registration. | ||
85 | |||
86 | e.g. | ||
87 | # echo 1 > /sys/kernel/fadump_release_mem | ||
88 | |||
89 | Please note that the firmware-assisted dump feature | ||
90 | is only available on Power6 and above systems with recent | ||
91 | firmware versions. | ||
92 | |||
93 | Implementation details: | ||
94 | ---------------------- | ||
95 | |||
96 | During boot, a check is made to see if firmware supports | ||
97 | this feature on that particular machine. If it does, then | ||
98 | we check to see if an active dump is waiting for us. If yes | ||
99 | then everything but boot memory size of RAM is reserved during | ||
100 | early boot (See Fig. 2). This area is released once we finish | ||
101 | collecting the dump from user land scripts (e.g. kdump scripts) | ||
102 | that are run. If there is dump data, then the | ||
103 | /sys/kernel/fadump_release_mem file is created, and the reserved | ||
104 | memory is held. | ||
105 | |||
106 | If there is no waiting dump data, then only the memory required | ||
107 | to hold CPU state, HPTE region, boot memory dump and elfcore | ||
108 | header, is reserved at the top of memory (see Fig. 1). This area | ||
109 | is *not* released: this region will be kept permanently reserved, | ||
110 | so that it can act as a receptacle for a copy of the boot memory | ||
111 | content in addition to CPU state and HPTE region, in the case a | ||
112 | crash does occur. | ||
113 | |||
114 | o Memory Reservation during first kernel | ||
115 | |||
116 | Low memory Top of memory | ||
117 | 0 boot memory size | | ||
118 | | | |<--Reserved dump area -->| | ||
119 | V V | Permanent Reservation V | ||
120 | +-----------+----------/ /----------+---+----+-----------+----+ | ||
121 | | | |CPU|HPTE| DUMP |ELF | | ||
122 | +-----------+----------/ /----------+---+----+-----------+----+ | ||
123 | | ^ | ||
124 | | | | ||
125 | \ / | ||
126 | ------------------------------------------- | ||
127 | Boot memory content gets transferred to | ||
128 | reserved area by firmware at the time of | ||
129 | crash | ||
130 | Fig. 1 | ||
131 | |||
132 | o Memory Reservation during second kernel after crash | ||
133 | |||
134 | Low memory Top of memory | ||
135 | 0 boot memory size | | ||
136 | | |<------------- Reserved dump area ----------- -->| | ||
137 | V V V | ||
138 | +-----------+----------/ /----------+---+----+-----------+----+ | ||
139 | | | |CPU|HPTE| DUMP |ELF | | ||
140 | +-----------+----------/ /----------+---+----+-----------+----+ | ||
141 | | | | ||
142 | V V | ||
143 | Used by second /proc/vmcore | ||
144 | kernel to boot | ||
145 | Fig. 2 | ||
146 | |||
147 | Currently the dump will be copied from /proc/vmcore to a | ||
148 | a new file upon user intervention. The dump data available through | ||
149 | /proc/vmcore will be in ELF format. Hence the existing kdump | ||
150 | infrastructure (kdump scripts) to save the dump works fine with | ||
151 | minor modifications. | ||
152 | |||
153 | The tools to examine the dump will be same as the ones | ||
154 | used for kdump. | ||
155 | |||
156 | How to enable firmware-assisted dump (fadump): | ||
157 | ------------------------------------- | ||
158 | |||
159 | 1. Set config option CONFIG_FA_DUMP=y and build kernel. | ||
160 | 2. Boot into linux kernel with 'fadump=on' kernel cmdline option. | ||
161 | 3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline | ||
162 | to specify size of the memory to reserve for boot memory dump | ||
163 | preservation. | ||
164 | |||
165 | NOTE: If firmware-assisted dump fails to reserve memory then it will | ||
166 | fallback to existing kdump mechanism if 'crashkernel=' option | ||
167 | is set at kernel cmdline. | ||
168 | |||
169 | Sysfs/debugfs files: | ||
170 | ------------ | ||
171 | |||
172 | Firmware-assisted dump feature uses sysfs file system to hold | ||
173 | the control files and debugfs file to display memory reserved region. | ||
174 | |||
175 | Here is the list of files under kernel sysfs: | ||
176 | |||
177 | /sys/kernel/fadump_enabled | ||
178 | |||
179 | This is used to display the fadump status. | ||
180 | 0 = fadump is disabled | ||
181 | 1 = fadump is enabled | ||
182 | |||
183 | This interface can be used by kdump init scripts to identify if | ||
184 | fadump is enabled in the kernel and act accordingly. | ||
185 | |||
186 | /sys/kernel/fadump_registered | ||
187 | |||
188 | This is used to display the fadump registration status as well | ||
189 | as to control (start/stop) the fadump registration. | ||
190 | 0 = fadump is not registered. | ||
191 | 1 = fadump is registered and ready to handle system crash. | ||
192 | |||
193 | To register fadump echo 1 > /sys/kernel/fadump_registered and | ||
194 | echo 0 > /sys/kernel/fadump_registered for un-register and stop the | ||
195 | fadump. Once the fadump is un-registered, the system crash will not | ||
196 | be handled and vmcore will not be captured. This interface can be | ||
197 | easily integrated with kdump service start/stop. | ||
198 | |||
199 | /sys/kernel/fadump_release_mem | ||
200 | |||
201 | This file is available only when fadump is active during | ||
202 | second kernel. This is used to release the reserved memory | ||
203 | region that are held for saving crash dump. To release the | ||
204 | reserved memory echo 1 to it: | ||
205 | |||
206 | echo 1 > /sys/kernel/fadump_release_mem | ||
207 | |||
208 | After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region | ||
209 | file will change to reflect the new memory reservations. | ||
210 | |||
211 | The existing userspace tools (kdump infrastructure) can be easily | ||
212 | enhanced to use this interface to release the memory reserved for | ||
213 | dump and continue without 2nd reboot. | ||
214 | |||
215 | Here is the list of files under powerpc debugfs: | ||
216 | (Assuming debugfs is mounted on /sys/kernel/debug directory.) | ||
217 | |||
218 | /sys/kernel/debug/powerpc/fadump_region | ||
219 | |||
220 | This file shows the reserved memory regions if fadump is | ||
221 | enabled otherwise this file is empty. The output format | ||
222 | is: | ||
223 | <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size> | ||
224 | |||
225 | e.g. | ||
226 | Contents when fadump is registered during first kernel | ||
227 | |||
228 | # cat /sys/kernel/debug/powerpc/fadump_region | ||
229 | CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0 | ||
230 | HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0 | ||
231 | DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0 | ||
232 | |||
233 | Contents when fadump is active during second kernel | ||
234 | |||
235 | # cat /sys/kernel/debug/powerpc/fadump_region | ||
236 | CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020 | ||
237 | HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000 | ||
238 | DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000 | ||
239 | : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000 | ||
240 | |||
241 | NOTE: Please refer to Documentation/filesystems/debugfs.txt on | ||
242 | how to mount the debugfs filesystem. | ||
243 | |||
244 | |||
245 | TODO: | ||
246 | ----- | ||
247 | o Need to come up with the better approach to find out more | ||
248 | accurate boot memory size that is required for a kernel to | ||
249 | boot successfully when booted with restricted memory. | ||
250 | o The fadump implementation introduces a fadump crash info structure | ||
251 | in the scratch area before the ELF core header. The idea of introducing | ||
252 | this structure is to pass some important crash info data to the second | ||
253 | kernel which will help second kernel to populate ELF core header with | ||
254 | correct data before it gets exported through /proc/vmcore. The current | ||
255 | design implementation does not address a possibility of introducing | ||
256 | additional fields (in future) to this structure without affecting | ||
257 | compatibility. Need to come up with the better approach to address this. | ||
258 | The possible approaches are: | ||
259 | 1. Introduce version field for version tracking, bump up the version | ||
260 | whenever a new field is added to the structure in future. The version | ||
261 | field can be used to find out what fields are valid for the current | ||
262 | version of the structure. | ||
263 | 2. Reserve the area of predefined size (say PAGE_SIZE) for this | ||
264 | structure and have unused area as reserved (initialized to zero) | ||
265 | for future field additions. | ||
266 | The advantage of approach 1 over 2 is we don't need to reserve extra space. | ||
267 | --- | ||
268 | Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> | ||
269 | This document is based on the original documentation written for phyp | ||
270 | assisted dump by Linas Vepstas and Manish Ahuja. | ||
diff --git a/Documentation/powerpc/mpc52xx.txt b/Documentation/powerpc/mpc52xx.txt index 10dd4ab93b85..0d540a31ea1a 100644 --- a/Documentation/powerpc/mpc52xx.txt +++ b/Documentation/powerpc/mpc52xx.txt | |||
@@ -2,7 +2,7 @@ Linux 2.6.x on MPC52xx family | |||
2 | ----------------------------- | 2 | ----------------------------- |
3 | 3 | ||
4 | For the latest info, go to http://www.246tNt.com/mpc52xx/ | 4 | For the latest info, go to http://www.246tNt.com/mpc52xx/ |
5 | 5 | ||
6 | To compile/use : | 6 | To compile/use : |
7 | 7 | ||
8 | - U-Boot: | 8 | - U-Boot: |
@@ -10,23 +10,23 @@ To compile/use : | |||
10 | if you wish to ). | 10 | if you wish to ). |
11 | # make lite5200_defconfig | 11 | # make lite5200_defconfig |
12 | # make uImage | 12 | # make uImage |
13 | 13 | ||
14 | then, on U-boot: | 14 | then, on U-boot: |
15 | => tftpboot 200000 uImage | 15 | => tftpboot 200000 uImage |
16 | => tftpboot 400000 pRamdisk | 16 | => tftpboot 400000 pRamdisk |
17 | => bootm 200000 400000 | 17 | => bootm 200000 400000 |
18 | 18 | ||
19 | - DBug: | 19 | - DBug: |
20 | # <edit Makefile to set ARCH=ppc & CROSS_COMPILE=... ( also EXTRAVERSION | 20 | # <edit Makefile to set ARCH=ppc & CROSS_COMPILE=... ( also EXTRAVERSION |
21 | if you wish to ). | 21 | if you wish to ). |
22 | # make lite5200_defconfig | 22 | # make lite5200_defconfig |
23 | # cp your_initrd.gz arch/ppc/boot/images/ramdisk.image.gz | 23 | # cp your_initrd.gz arch/ppc/boot/images/ramdisk.image.gz |
24 | # make zImage.initrd | 24 | # make zImage.initrd |
25 | # make | 25 | # make |
26 | 26 | ||
27 | then in DBug: | 27 | then in DBug: |
28 | DBug> dn -i zImage.initrd.lite5200 | 28 | DBug> dn -i zImage.initrd.lite5200 |
29 | 29 | ||
30 | 30 | ||
31 | Some remarks : | 31 | Some remarks : |
32 | - The port is named mpc52xxx, and config options are PPC_MPC52xx. The MGT5100 | 32 | - The port is named mpc52xxx, and config options are PPC_MPC52xx. The MGT5100 |
diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt deleted file mode 100644 index ad340205d96a..000000000000 --- a/Documentation/powerpc/phyp-assisted-dump.txt +++ /dev/null | |||
@@ -1,127 +0,0 @@ | |||
1 | |||
2 | Hypervisor-Assisted Dump | ||
3 | ------------------------ | ||
4 | November 2007 | ||
5 | |||
6 | The goal of hypervisor-assisted dump is to enable the dump of | ||
7 | a crashed system, and to do so from a fully-reset system, and | ||
8 | to minimize the total elapsed time until the system is back | ||
9 | in production use. | ||
10 | |||
11 | As compared to kdump or other strategies, hypervisor-assisted | ||
12 | dump offers several strong, practical advantages: | ||
13 | |||
14 | -- Unlike kdump, the system has been reset, and loaded | ||
15 | with a fresh copy of the kernel. In particular, | ||
16 | PCI and I/O devices have been reinitialized and are | ||
17 | in a clean, consistent state. | ||
18 | -- As the dump is performed, the dumped memory becomes | ||
19 | immediately available to the system for normal use. | ||
20 | -- After the dump is completed, no further reboots are | ||
21 | required; the system will be fully usable, and running | ||
22 | in its normal, production mode on its normal kernel. | ||
23 | |||
24 | The above can only be accomplished by coordination with, | ||
25 | and assistance from the hypervisor. The procedure is | ||
26 | as follows: | ||
27 | |||
28 | -- When a system crashes, the hypervisor will save | ||
29 | the low 256MB of RAM to a previously registered | ||
30 | save region. It will also save system state, system | ||
31 | registers, and hardware PTE's. | ||
32 | |||
33 | -- After the low 256MB area has been saved, the | ||
34 | hypervisor will reset PCI and other hardware state. | ||
35 | It will *not* clear RAM. It will then launch the | ||
36 | bootloader, as normal. | ||
37 | |||
38 | -- The freshly booted kernel will notice that there | ||
39 | is a new node (ibm,dump-kernel) in the device tree, | ||
40 | indicating that there is crash data available from | ||
41 | a previous boot. It will boot into only 256MB of RAM, | ||
42 | reserving the rest of system memory. | ||
43 | |||
44 | -- Userspace tools will parse /sys/kernel/release_region | ||
45 | and read /proc/vmcore to obtain the contents of memory, | ||
46 | which holds the previous crashed kernel. The userspace | ||
47 | tools may copy this info to disk, or network, nas, san, | ||
48 | iscsi, etc. as desired. | ||
49 | |||
50 | For Example: the values in /sys/kernel/release-region | ||
51 | would look something like this (address-range pairs). | ||
52 | CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / | ||
53 | DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A | ||
54 | |||
55 | -- As the userspace tools complete saving a portion of | ||
56 | dump, they echo an offset and size to | ||
57 | /sys/kernel/release_region to release the reserved | ||
58 | memory back to general use. | ||
59 | |||
60 | An example of this is: | ||
61 | "echo 0x40000000 0x10000000 > /sys/kernel/release_region" | ||
62 | which will release 256MB at the 1GB boundary. | ||
63 | |||
64 | Please note that the hypervisor-assisted dump feature | ||
65 | is only available on Power6-based systems with recent | ||
66 | firmware versions. | ||
67 | |||
68 | Implementation details: | ||
69 | ---------------------- | ||
70 | |||
71 | During boot, a check is made to see if firmware supports | ||
72 | this feature on this particular machine. If it does, then | ||
73 | we check to see if a active dump is waiting for us. If yes | ||
74 | then everything but 256 MB of RAM is reserved during early | ||
75 | boot. This area is released once we collect a dump from user | ||
76 | land scripts that are run. If there is dump data, then | ||
77 | the /sys/kernel/release_region file is created, and | ||
78 | the reserved memory is held. | ||
79 | |||
80 | If there is no waiting dump data, then only the highest | ||
81 | 256MB of the ram is reserved as a scratch area. This area | ||
82 | is *not* released: this region will be kept permanently | ||
83 | reserved, so that it can act as a receptacle for a copy | ||
84 | of the low 256MB in the case a crash does occur. See, | ||
85 | however, "open issues" below, as to whether | ||
86 | such a reserved region is really needed. | ||
87 | |||
88 | Currently the dump will be copied from /proc/vmcore to a | ||
89 | a new file upon user intervention. The starting address | ||
90 | to be read and the range for each data point in provided | ||
91 | in /sys/kernel/release_region. | ||
92 | |||
93 | The tools to examine the dump will be same as the ones | ||
94 | used for kdump. | ||
95 | |||
96 | General notes: | ||
97 | -------------- | ||
98 | Security: please note that there are potential security issues | ||
99 | with any sort of dump mechanism. In particular, plaintext | ||
100 | (unencrypted) data, and possibly passwords, may be present in | ||
101 | the dump data. Userspace tools must take adequate precautions to | ||
102 | preserve security. | ||
103 | |||
104 | Open issues/ToDo: | ||
105 | ------------ | ||
106 | o The various code paths that tell the hypervisor that a crash | ||
107 | occurred, vs. it simply being a normal reboot, should be | ||
108 | reviewed, and possibly clarified/fixed. | ||
109 | |||
110 | o Instead of using /sys/kernel, should there be a /sys/dump | ||
111 | instead? There is a dump_subsys being created by the s390 code, | ||
112 | perhaps the pseries code should use a similar layout as well. | ||
113 | |||
114 | o Is reserving a 256MB region really required? The goal of | ||
115 | reserving a 256MB scratch area is to make sure that no | ||
116 | important crash data is clobbered when the hypervisor | ||
117 | save low mem to the scratch area. But, if one could assure | ||
118 | that nothing important is located in some 256MB area, then | ||
119 | it would not need to be reserved. Something that can be | ||
120 | improved in subsequent versions. | ||
121 | |||
122 | o Still working the kdump team to integrate this with kdump, | ||
123 | some work remains but this would not affect the current | ||
124 | patches. | ||
125 | |||
126 | o Still need to write a shell script, to copy the dump away. | ||
127 | Currently I am parsing it manually. | ||