diff options
author | Lianbo Jiang <lijiang@redhat.com> | 2019-01-10 07:19:43 -0500 |
---|---|---|
committer | Borislav Petkov <bp@suse.de> | 2019-01-15 05:05:28 -0500 |
commit | f263245a0ce2c4e23b89a58fa5f7dfc048e11929 (patch) | |
tree | 3ead0ce78b799de8c647987f7d381e2500139a65 | |
parent | 65f750e5457aef9a8085a99d613fea0430303e93 (diff) |
kdump: Document kernel data exported in the vmcoreinfo note
Document data exported in vmcoreinfo and briefly describe its use by
userspace tools.
[ bp: heavily massage and redact the text. ]
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: anderson@redhat.com
Cc: k-hagio@ab.jp.nec.com
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: mingo@redhat.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190110121944.6050-2-lijiang@redhat.com
-rw-r--r-- | Documentation/kdump/vmcoreinfo.txt | 495 |
1 files changed, 495 insertions, 0 deletions
diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt new file mode 100644 index 000000000000..bb94a4bd597a --- /dev/null +++ b/Documentation/kdump/vmcoreinfo.txt | |||
@@ -0,0 +1,495 @@ | |||
1 | ================================================================ | ||
2 | VMCOREINFO | ||
3 | ================================================================ | ||
4 | |||
5 | =========== | ||
6 | What is it? | ||
7 | =========== | ||
8 | |||
9 | VMCOREINFO is a special ELF note section. It contains various | ||
10 | information from the kernel like structure size, page size, symbol | ||
11 | values, field offsets, etc. These data are packed into an ELF note | ||
12 | section and used by user-space tools like crash and makedumpfile to | ||
13 | analyze a kernel's memory layout. | ||
14 | |||
15 | ================ | ||
16 | Common variables | ||
17 | ================ | ||
18 | |||
19 | init_uts_ns.name.release | ||
20 | ------------------------ | ||
21 | |||
22 | The version of the Linux kernel. Used to find the corresponding source | ||
23 | code from which the kernel has been built. For example, crash uses it to | ||
24 | find the corresponding vmlinux in order to process vmcore. | ||
25 | |||
26 | PAGE_SIZE | ||
27 | --------- | ||
28 | |||
29 | The size of a page. It is the smallest unit of data used by the memory | ||
30 | management facilities. It is usually 4096 bytes of size and a page is | ||
31 | aligned on 4096 bytes. Used for computing page addresses. | ||
32 | |||
33 | init_uts_ns | ||
34 | ----------- | ||
35 | |||
36 | The UTS namespace which is used to isolate two specific elements of the | ||
37 | system that relate to the uname(2) system call. It is named after the | ||
38 | data structure used to store information returned by the uname(2) system | ||
39 | call. | ||
40 | |||
41 | User-space tools can get the kernel name, host name, kernel release | ||
42 | number, kernel version, architecture name and OS type from it. | ||
43 | |||
44 | node_online_map | ||
45 | --------------- | ||
46 | |||
47 | An array node_states[N_ONLINE] which represents the set of online nodes | ||
48 | in a system, one bit position per node number. Used to keep track of | ||
49 | which nodes are in the system and online. | ||
50 | |||
51 | swapper_pg_dir | ||
52 | ------------- | ||
53 | |||
54 | The global page directory pointer of the kernel. Used to translate | ||
55 | virtual to physical addresses. | ||
56 | |||
57 | _stext | ||
58 | ------ | ||
59 | |||
60 | Defines the beginning of the text section. In general, _stext indicates | ||
61 | the kernel start address. Used to convert a virtual address from the | ||
62 | direct kernel map to a physical address. | ||
63 | |||
64 | vmap_area_list | ||
65 | -------------- | ||
66 | |||
67 | Stores the virtual area list. makedumpfile gets the vmalloc start value | ||
68 | from this variable and its value is necessary for vmalloc translation. | ||
69 | |||
70 | mem_map | ||
71 | ------- | ||
72 | |||
73 | Physical addresses are translated to struct pages by treating them as | ||
74 | an index into the mem_map array. Right-shifting a physical address | ||
75 | PAGE_SHIFT bits converts it into a page frame number which is an index | ||
76 | into that mem_map array. | ||
77 | |||
78 | Used to map an address to the corresponding struct page. | ||
79 | |||
80 | contig_page_data | ||
81 | ---------------- | ||
82 | |||
83 | Makedumpfile gets the pglist_data structure from this symbol, which is | ||
84 | used to describe the memory layout. | ||
85 | |||
86 | User-space tools use this to exclude free pages when dumping memory. | ||
87 | |||
88 | mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map) | ||
89 | -------------------------------------------------------------------------- | ||
90 | |||
91 | The address of the mem_section array, its length, structure size, and | ||
92 | the section_mem_map offset. | ||
93 | |||
94 | It exists in the sparse memory mapping model, and it is also somewhat | ||
95 | similar to the mem_map variable, both of them are used to translate an | ||
96 | address. | ||
97 | |||
98 | page | ||
99 | ---- | ||
100 | |||
101 | The size of a page structure. struct page is an important data structure | ||
102 | and it is widely used to compute contiguous memory. | ||
103 | |||
104 | pglist_data | ||
105 | ----------- | ||
106 | |||
107 | The size of a pglist_data structure. This value is used to check if the | ||
108 | pglist_data structure is valid. It is also used for checking the memory | ||
109 | type. | ||
110 | |||
111 | zone | ||
112 | ---- | ||
113 | |||
114 | The size of a zone structure. This value is used to check if the zone | ||
115 | structure has been found. It is also used for excluding free pages. | ||
116 | |||
117 | free_area | ||
118 | --------- | ||
119 | |||
120 | The size of a free_area structure. It indicates whether the free_area | ||
121 | structure is valid or not. Useful when excluding free pages. | ||
122 | |||
123 | list_head | ||
124 | --------- | ||
125 | |||
126 | The size of a list_head structure. Used when iterating lists in a | ||
127 | post-mortem analysis session. | ||
128 | |||
129 | nodemask_t | ||
130 | ---------- | ||
131 | |||
132 | The size of a nodemask_t type. Used to compute the number of online | ||
133 | nodes. | ||
134 | |||
135 | (page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor| | ||
136 | compound_order|compound_head) | ||
137 | ------------------------------------------------------------------- | ||
138 | |||
139 | User-space tools compute their values based on the offset of these | ||
140 | variables. The variables are used when excluding unnecessary pages. | ||
141 | |||
142 | (pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_ | ||
143 | spanned_pages|node_id) | ||
144 | ------------------------------------------------------------------- | ||
145 | |||
146 | On NUMA machines, each NUMA node has a pg_data_t to describe its memory | ||
147 | layout. On UMA machines there is a single pglist_data which describes the | ||
148 | whole memory. | ||
149 | |||
150 | These values are used to check the memory type and to compute the | ||
151 | virtual address for memory map. | ||
152 | |||
153 | (zone, free_area|vm_stat|spanned_pages) | ||
154 | --------------------------------------- | ||
155 | |||
156 | Each node is divided into a number of blocks called zones which | ||
157 | represent ranges within memory. A zone is described by a structure zone. | ||
158 | |||
159 | User-space tools compute required values based on the offset of these | ||
160 | variables. | ||
161 | |||
162 | (free_area, free_list) | ||
163 | ---------------------- | ||
164 | |||
165 | Offset of the free_list's member. This value is used to compute the number | ||
166 | of free pages. | ||
167 | |||
168 | Each zone has a free_area structure array called free_area[MAX_ORDER]. | ||
169 | The free_list represents a linked list of free page blocks. | ||
170 | |||
171 | (list_head, next|prev) | ||
172 | ---------------------- | ||
173 | |||
174 | Offsets of the list_head's members. list_head is used to define a | ||
175 | circular linked list. User-space tools need these in order to traverse | ||
176 | lists. | ||
177 | |||
178 | (vmap_area, va_start|list) | ||
179 | -------------------------- | ||
180 | |||
181 | Offsets of the vmap_area's members. They carry vmalloc-specific | ||
182 | information. Makedumpfile gets the start address of the vmalloc region | ||
183 | from this. | ||
184 | |||
185 | (zone.free_area, MAX_ORDER) | ||
186 | --------------------------- | ||
187 | |||
188 | Free areas descriptor. User-space tools use this value to iterate the | ||
189 | free_area ranges. MAX_ORDER is used by the zone buddy allocator. | ||
190 | |||
191 | log_first_idx | ||
192 | ------------- | ||
193 | |||
194 | Index of the first record stored in the buffer log_buf. Used by | ||
195 | user-space tools to read the strings in the log_buf. | ||
196 | |||
197 | log_buf | ||
198 | ------- | ||
199 | |||
200 | Console output is written to the ring buffer log_buf at index | ||
201 | log_first_idx. Used to get the kernel log. | ||
202 | |||
203 | log_buf_len | ||
204 | ----------- | ||
205 | |||
206 | log_buf's length. | ||
207 | |||
208 | clear_idx | ||
209 | --------- | ||
210 | |||
211 | The index that the next printk() record to read after the last clear | ||
212 | command. It indicates the first record after the last SYSLOG_ACTION | ||
213 | _CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump | ||
214 | the dmesg log. | ||
215 | |||
216 | log_next_idx | ||
217 | ------------ | ||
218 | |||
219 | The index of the next record to store in the buffer log_buf. Used to | ||
220 | compute the index of the current buffer position. | ||
221 | |||
222 | printk_log | ||
223 | ---------- | ||
224 | |||
225 | The size of a structure printk_log. Used to compute the size of | ||
226 | messages, and extract dmesg log. It encapsulates header information for | ||
227 | log_buf, such as timestamp, syslog level, etc. | ||
228 | |||
229 | (printk_log, ts_nsec|len|text_len|dict_len) | ||
230 | ------------------------------------------- | ||
231 | |||
232 | It represents field offsets in struct printk_log. User space tools | ||
233 | parse it and check whether the values of printk_log's members have been | ||
234 | changed. | ||
235 | |||
236 | (free_area.free_list, MIGRATE_TYPES) | ||
237 | ------------------------------------ | ||
238 | |||
239 | The number of migrate types for pages. The free_list is described by the | ||
240 | array. Used by tools to compute the number of free pages. | ||
241 | |||
242 | NR_FREE_PAGES | ||
243 | ------------- | ||
244 | |||
245 | On linux-2.6.21 or later, the number of free pages is in | ||
246 | vm_stat[NR_FREE_PAGES]. Used to get the number of free pages. | ||
247 | |||
248 | PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision | ||
249 | |PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy) | ||
250 | |PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline) | ||
251 | ----------------------------------------------------------------- | ||
252 | |||
253 | Page attributes. These flags are used to filter various unnecessary for | ||
254 | dumping pages. | ||
255 | |||
256 | HUGETLB_PAGE_DTOR | ||
257 | ----------------- | ||
258 | |||
259 | The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile | ||
260 | excludes these pages. | ||
261 | |||
262 | ====== | ||
263 | x86_64 | ||
264 | ====== | ||
265 | |||
266 | phys_base | ||
267 | --------- | ||
268 | |||
269 | Used to convert the virtual address of an exported kernel symbol to its | ||
270 | corresponding physical address. | ||
271 | |||
272 | init_top_pgt | ||
273 | ------------ | ||
274 | |||
275 | Used to walk through the whole page table and convert virtual addresses | ||
276 | to physical addresses. The init_top_pgt is somewhat similar to | ||
277 | swapper_pg_dir, but it is only used in x86_64. | ||
278 | |||
279 | pgtable_l5_enabled | ||
280 | ------------------ | ||
281 | |||
282 | User-space tools need to know whether the crash kernel was in 5-level | ||
283 | paging mode. | ||
284 | |||
285 | node_data | ||
286 | --------- | ||
287 | |||
288 | This is a struct pglist_data array and stores all NUMA nodes | ||
289 | information. Makedumpfile gets the pglist_data structure from it. | ||
290 | |||
291 | (node_data, MAX_NUMNODES) | ||
292 | ------------------------- | ||
293 | |||
294 | The maximum number of nodes in system. | ||
295 | |||
296 | KERNELOFFSET | ||
297 | ------------ | ||
298 | |||
299 | The kernel randomization offset. Used to compute the page offset. If | ||
300 | KASLR is disabled, this value is zero. | ||
301 | |||
302 | KERNEL_IMAGE_SIZE | ||
303 | ----------------- | ||
304 | |||
305 | Currently unused by Makedumpfile. Used to compute the module virtual | ||
306 | address by Crash. | ||
307 | |||
308 | sme_mask | ||
309 | -------- | ||
310 | |||
311 | AMD-specific with SME support: it indicates the secure memory encryption | ||
312 | mask. Makedumpfile tools need to know whether the crash kernel was | ||
313 | encrypted. If SME is enabled in the first kernel, the crash kernel's | ||
314 | page table entries (pgd/pud/pmd/pte) contain the memory encryption | ||
315 | mask. This is used to remove the SME mask and obtain the true physical | ||
316 | address. | ||
317 | |||
318 | Currently, sme_mask stores the value of the C-bit position. If needed, | ||
319 | additional SME-relevant info can be placed in that variable. | ||
320 | |||
321 | For example: | ||
322 | [ misc ][ enc bit ][ other misc SME info ] | ||
323 | 0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000 | ||
324 | 63 59 55 51 47 43 39 35 31 27 ... 3 | ||
325 | |||
326 | ====== | ||
327 | x86_32 | ||
328 | ====== | ||
329 | |||
330 | X86_PAE | ||
331 | ------- | ||
332 | |||
333 | Denotes whether physical address extensions are enabled. It has the cost | ||
334 | of a higher page table lookup overhead, and also consumes more page | ||
335 | table space per process. Used to check whether PAE was enabled in the | ||
336 | crash kernel when converting virtual addresses to physical addresses. | ||
337 | |||
338 | ==== | ||
339 | ia64 | ||
340 | ==== | ||
341 | |||
342 | pgdat_list|(pgdat_list, MAX_NUMNODES) | ||
343 | ------------------------------------- | ||
344 | |||
345 | pg_data_t array storing all NUMA nodes information. MAX_NUMNODES | ||
346 | indicates the number of the nodes. | ||
347 | |||
348 | node_memblk|(node_memblk, NR_NODE_MEMBLKS) | ||
349 | ------------------------------------------ | ||
350 | |||
351 | List of node memory chunks. Filled when parsing the SRAT table to obtain | ||
352 | information about memory nodes. NR_NODE_MEMBLKS indicates the number of | ||
353 | node memory chunks. | ||
354 | |||
355 | These values are used to compute the number of nodes the crashed kernel used. | ||
356 | |||
357 | node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size) | ||
358 | ---------------------------------------------------------------- | ||
359 | |||
360 | The size of a struct node_memblk_s and the offsets of the | ||
361 | node_memblk_s's members. Used to compute the number of nodes. | ||
362 | |||
363 | PGTABLE_3|PGTABLE_4 | ||
364 | ------------------- | ||
365 | |||
366 | User-space tools need to know whether the crash kernel was in 3-level or | ||
367 | 4-level paging mode. Used to distinguish the page table. | ||
368 | |||
369 | ===== | ||
370 | ARM64 | ||
371 | ===== | ||
372 | |||
373 | VA_BITS | ||
374 | ------- | ||
375 | |||
376 | The maximum number of bits for virtual addresses. Used to compute the | ||
377 | virtual memory ranges. | ||
378 | |||
379 | kimage_voffset | ||
380 | -------------- | ||
381 | |||
382 | The offset between the kernel virtual and physical mappings. Used to | ||
383 | translate virtual to physical addresses. | ||
384 | |||
385 | PHYS_OFFSET | ||
386 | ----------- | ||
387 | |||
388 | Indicates the physical address of the start of memory. Similar to | ||
389 | kimage_voffset, which is used to translate virtual to physical | ||
390 | addresses. | ||
391 | |||
392 | KERNELOFFSET | ||
393 | ------------ | ||
394 | |||
395 | The kernel randomization offset. Used to compute the page offset. If | ||
396 | KASLR is disabled, this value is zero. | ||
397 | |||
398 | ==== | ||
399 | arm | ||
400 | ==== | ||
401 | |||
402 | ARM_LPAE | ||
403 | -------- | ||
404 | |||
405 | It indicates whether the crash kernel supports large physical address | ||
406 | extensions. Used to translate virtual to physical addresses. | ||
407 | |||
408 | ==== | ||
409 | s390 | ||
410 | ==== | ||
411 | |||
412 | lowcore_ptr | ||
413 | ---------- | ||
414 | |||
415 | An array with a pointer to the lowcore of every CPU. Used to print the | ||
416 | psw and all registers information. | ||
417 | |||
418 | high_memory | ||
419 | ----------- | ||
420 | |||
421 | Used to get the vmalloc_start address from the high_memory symbol. | ||
422 | |||
423 | (lowcore_ptr, NR_CPUS) | ||
424 | ---------------------- | ||
425 | |||
426 | The maximum number of CPUs. | ||
427 | |||
428 | ======= | ||
429 | powerpc | ||
430 | ======= | ||
431 | |||
432 | |||
433 | node_data|(node_data, MAX_NUMNODES) | ||
434 | ----------------------------------- | ||
435 | |||
436 | See above. | ||
437 | |||
438 | contig_page_data | ||
439 | ---------------- | ||
440 | |||
441 | See above. | ||
442 | |||
443 | vmemmap_list | ||
444 | ------------ | ||
445 | |||
446 | The vmemmap_list maintains the entire vmemmap physical mapping. Used | ||
447 | to get vmemmap list count and populated vmemmap regions info. If the | ||
448 | vmemmap address translation information is stored in the crash kernel, | ||
449 | it is used to translate vmemmap kernel virtual addresses. | ||
450 | |||
451 | mmu_vmemmap_psize | ||
452 | ----------------- | ||
453 | |||
454 | The size of a page. Used to translate virtual to physical addresses. | ||
455 | |||
456 | mmu_psize_defs | ||
457 | -------------- | ||
458 | |||
459 | Page size definitions, i.e. 4k, 64k, or 16M. | ||
460 | |||
461 | Used to make vtop translations. | ||
462 | |||
463 | vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)| | ||
464 | (vmemmap_backing, virt_addr) | ||
465 | ---------------------------------------------------------------- | ||
466 | |||
467 | The vmemmap virtual address space management does not have a traditional | ||
468 | page table to track which virtual struct pages are backed by a physical | ||
469 | mapping. The virtual to physical mappings are tracked in a simple linked | ||
470 | list format. | ||
471 | |||
472 | User-space tools need to know the offset of list, phys and virt_addr | ||
473 | when computing the count of vmemmap regions. | ||
474 | |||
475 | mmu_psize_def|(mmu_psize_def, shift) | ||
476 | ------------------------------------ | ||
477 | |||
478 | The size of a struct mmu_psize_def and the offset of mmu_psize_def's | ||
479 | member. | ||
480 | |||
481 | Used in vtop translations. | ||
482 | |||
483 | == | ||
484 | sh | ||
485 | == | ||
486 | |||
487 | node_data|(node_data, MAX_NUMNODES) | ||
488 | ----------------------------------- | ||
489 | |||
490 | See above. | ||
491 | |||
492 | X2TLB | ||
493 | ----- | ||
494 | |||
495 | Indicates whether the crashed kernel enabled SH extended mode. | ||