aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@woody.linux-foundation.org>2007-02-14 12:46:06 -0500
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-02-14 12:46:06 -0500
commit414f827c46973ba39320cfb43feb55a0eeb9b4e8 (patch)
tree45e860974ef698e71370a0ebdddcff4f14fbdf9e /Documentation
parent86a71dbd3e81e8870d0f0e56b87875f57e58222b (diff)
parent126b1922367fbe5513daa675a2abd13ed3917f4e (diff)
Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits) [PATCH] x86-64: Remove mk_pte_phys() [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386 [PATCH] i386: fix 32-bit ioctls on x64_32 [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64 [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header. [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c [PATCH] i386: Move mce_disabled to asm/mce.h [PATCH] i386: paravirt unhandled fallthrough [PATCH] x86_64: Wire up compat epoll_pwait [PATCH] x86: Don't require the vDSO for handling a.out signals [PATCH] i386: Fix Cyrix MediaGX detection [PATCH] i386: Fix warning in cpu initialization [PATCH] i386: Fix warning in microcode.c [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo [PATCH] i386: Remove fastcall in paravirt.[ch] [PATCH] x86-64: Fix wrong gcc check in bitops.h [PATCH] x86-64: survive having no irq mapping for a vector [PATCH] i386: geode configuration fixes [PATCH] i386: add option to show more code in oops reports ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/kernel-parameters.txt8
-rw-r--r--Documentation/x86_64/boot-options.txt132
-rw-r--r--Documentation/x86_64/cpu-hotplug-spec2
-rw-r--r--Documentation/x86_64/kernel-stacks26
-rw-r--r--Documentation/x86_64/machinecheck70
-rw-r--r--Documentation/x86_64/mm.txt22
6 files changed, 186 insertions, 74 deletions
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index d25acd51e181..22b19962a1a2 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -104,6 +104,9 @@ loader, and have no meaning to the kernel directly.
104Do not modify the syntax of boot loader parameters without extreme 104Do not modify the syntax of boot loader parameters without extreme
105need or coordination with <Documentation/i386/boot.txt>. 105need or coordination with <Documentation/i386/boot.txt>.
106 106
107There are also arch-specific kernel-parameters not documented here.
108See for example <Documentation/x86_64/boot-options.txt>.
109
107Note that ALL kernel parameters listed below are CASE SENSITIVE, and that 110Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
108a trailing = on the name of any parameter states that that parameter will 111a trailing = on the name of any parameter states that that parameter will
109be entered as an environment variable, whereas its absence indicates that 112be entered as an environment variable, whereas its absence indicates that
@@ -361,6 +364,11 @@ and is between 256 and 4096 characters. It is defined in the file
361 clocksource is not available, it defaults to PIT. 364 clocksource is not available, it defaults to PIT.
362 Format: { pit | tsc | cyclone | pmtmr } 365 Format: { pit | tsc | cyclone | pmtmr }
363 366
367 code_bytes [IA32] How many bytes of object code to print in an
368 oops report.
369 Range: 0 - 8192
370 Default: 64
371
364 disable_8254_timer 372 disable_8254_timer
365 enable_8254_timer 373 enable_8254_timer
366 [IA32/X86_64] Disable/Enable interrupt 0 timer routing 374 [IA32/X86_64] Disable/Enable interrupt 0 timer routing
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 5c86ed6f0448..625a21db0c2a 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -180,40 +180,81 @@ PCI
180 pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says. 180 pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says.
181 pci=noacpi Don't use ACPI to set up PCI interrupt routing. 181 pci=noacpi Don't use ACPI to set up PCI interrupt routing.
182 182
183IOMMU 183IOMMU (input/output memory management unit)
184 184
185 iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge] 185 Currently four x86-64 PCI-DMA mapping implementations exist:
186 [,forcesac][,fullflush][,nomerge][,noaperture][,calgary] 186
187 size set size of iommu (in bytes) 187 1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
188 noagp don't initialize the AGP driver and use full aperture. 188 (e.g. because you have < 3 GB memory).
189 off don't use the IOMMU 189 Kernel boot message: "PCI-DMA: Disabling IOMMU"
190 leak turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on) 190
191 memaper[=order] allocate an own aperture over RAM with size 32MB^order. 191 2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU.
192 noforce don't force IOMMU usage. Default. 192 Kernel boot message: "PCI-DMA: using GART IOMMU"
193 force Force IOMMU. 193
194 merge Do SG merging. Implies force (experimental) 194 3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
195 nomerge Don't do SG merging. 195 e.g. if there is no hardware IOMMU in the system and it is need because
196 forcesac For SAC mode for masks <40bits (experimental) 196 you have >3GB memory or told the kernel to us it (iommu=soft))
197 fullflush Flush IOMMU on each allocation (default) 197 Kernel boot message: "PCI-DMA: Using software bounce buffering
198 nofullflush Don't use IOMMU fullflush 198 for IO (SWIOTLB)"
199 allowed overwrite iommu off workarounds for specific chipsets. 199
200 soft Use software bounce buffering (default for Intel machines) 200 4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
201 noaperture Don't touch the aperture for AGP. 201 pSeries and xSeries servers. This hardware IOMMU supports DMA address
202 allowdac Allow DMA >4GB 202 mapping with memory protection, etc.
203 When off all DMA over >4GB is forced through an IOMMU or bounce 203 Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
204 buffering. 204
205 nodac Forbid DMA >4GB 205 iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
206 panic Always panic when IOMMU overflows 206 [,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
207 calgary Use the Calgary IOMMU if it is available 207 [,noaperture][,calgary]
208 208
209 swiotlb=pages[,force] 209 General iommu options:
210 210 off Don't initialize and use any kind of IOMMU.
211 pages Prereserve that many 128K pages for the software IO bounce buffering. 211 noforce Don't force hardware IOMMU usage when it is not needed.
212 force Force all IO through the software TLB. 212 (default).
213 213 force Force the use of the hardware IOMMU even when it is
214 calgary=[64k,128k,256k,512k,1M,2M,4M,8M] 214 not actually needed (e.g. because < 3 GB memory).
215 calgary=[translate_empty_slots] 215 soft Use software bounce buffering (SWIOTLB) (default for
216 calgary=[disable=<PCI bus number>] 216 Intel machines). This can be used to prevent the usage
217 of an available hardware IOMMU.
218
219 iommu options only relevant to the AMD GART hardware IOMMU:
220 <size> Set the size of the remapping area in bytes.
221 allowed Overwrite iommu off workarounds for specific chipsets.
222 fullflush Flush IOMMU on each allocation (default).
223 nofullflush Don't use IOMMU fullflush.
224 leak Turn on simple iommu leak tracing (only when
225 CONFIG_IOMMU_LEAK is on). Default number of leak pages
226 is 20.
227 memaper[=<order>] Allocate an own aperture over RAM with size 32MB<<order.
228 (default: order=1, i.e. 64MB)
229 merge Do scatter-gather (SG) merging. Implies "force"
230 (experimental).
231 nomerge Don't do scatter-gather (SG) merging.
232 noaperture Ask the IOMMU not to touch the aperture for AGP.
233 forcesac Force single-address cycle (SAC) mode for masks <40bits
234 (experimental).
235 noagp Don't initialize the AGP driver and use full aperture.
236 allowdac Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
237 DAC is used with 32-bit PCI to push a 64-bit address in
238 two cycles. When off all DMA over >4GB is forced through
239 an IOMMU or software bounce buffering.
240 nodac Forbid DAC mode, i.e. DMA >4GB.
241 panic Always panic when IOMMU overflows.
242 calgary Use the Calgary IOMMU if it is available
243
244 iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
245 implementation:
246 swiotlb=<pages>[,force]
247 <pages> Prereserve that many 128K pages for the software IO
248 bounce buffering.
249 force Force all IO through the software TLB.
250
251 Settings for the IBM Calgary hardware IOMMU currently found in IBM
252 pSeries and xSeries machines:
253
254 calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
255 calgary=[translate_empty_slots]
256 calgary=[disable=<PCI bus number>]
257 panic Always panic when IOMMU overflows
217 258
218 64k,...,8M - Set the size of each PCI slot's translation table 259 64k,...,8M - Set the size of each PCI slot's translation table
219 when using the Calgary IOMMU. This is the size of the translation 260 when using the Calgary IOMMU. This is the size of the translation
@@ -234,14 +275,14 @@ IOMMU
234 275
235Debugging 276Debugging
236 277
237 oops=panic Always panic on oopses. Default is to just kill the process, 278 oops=panic Always panic on oopses. Default is to just kill the process,
238 but there is a small probability of deadlocking the machine. 279 but there is a small probability of deadlocking the machine.
239 This will also cause panics on machine check exceptions. 280 This will also cause panics on machine check exceptions.
240 Useful together with panic=30 to trigger a reboot. 281 Useful together with panic=30 to trigger a reboot.
241 282
242 kstack=N Print that many words from the kernel stack in oops dumps. 283 kstack=N Print N words from the kernel stack in oops dumps.
243 284
244 pagefaulttrace Dump all page faults. Only useful for extreme debugging 285 pagefaulttrace Dump all page faults. Only useful for extreme debugging
245 and will create a lot of output. 286 and will create a lot of output.
246 287
247 call_trace=[old|both|newfallback|new] 288 call_trace=[old|both|newfallback|new]
@@ -251,15 +292,8 @@ Debugging
251 newfallback: use new unwinder but fall back to old if it gets 292 newfallback: use new unwinder but fall back to old if it gets
252 stuck (default) 293 stuck (default)
253 294
254 call_trace=[old|both|newfallback|new] 295Miscellaneous
255 old: use old inexact backtracer
256 new: use new exact dwarf2 unwinder
257 both: print entries from both
258 newfallback: use new unwinder but fall back to old if it gets
259 stuck (default)
260
261Misc
262 296
263 noreplacement Don't replace instructions with more appropriate ones 297 noreplacement Don't replace instructions with more appropriate ones
264 for the CPU. This may be useful on asymmetric MP systems 298 for the CPU. This may be useful on asymmetric MP systems
265 where some CPU have less capabilities than the others. 299 where some CPUs have less capabilities than others.
diff --git a/Documentation/x86_64/cpu-hotplug-spec b/Documentation/x86_64/cpu-hotplug-spec
index 5c0fa345e556..3c23e0587db3 100644
--- a/Documentation/x86_64/cpu-hotplug-spec
+++ b/Documentation/x86_64/cpu-hotplug-spec
@@ -2,7 +2,7 @@ Firmware support for CPU hotplug under Linux/x86-64
2--------------------------------------------------- 2---------------------------------------------------
3 3
4Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to 4Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
5know in advance boot time the maximum number of CPUs that could be plugged 5know in advance of boot time the maximum number of CPUs that could be plugged
6into the system. ACPI 3.0 currently has no official way to supply 6into the system. ACPI 3.0 currently has no official way to supply
7this information from the firmware to the operating system. 7this information from the firmware to the operating system.
8 8
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks
index bddfddd466ab..5ad65d51fb95 100644
--- a/Documentation/x86_64/kernel-stacks
+++ b/Documentation/x86_64/kernel-stacks
@@ -9,9 +9,9 @@ zombie. While the thread is in user space the kernel stack is empty
9except for the thread_info structure at the bottom. 9except for the thread_info structure at the bottom.
10 10
11In addition to the per thread stacks, there are specialized stacks 11In addition to the per thread stacks, there are specialized stacks
12associated with each cpu. These stacks are only used while the kernel 12associated with each CPU. These stacks are only used while the kernel
13is in control on that cpu, when a cpu returns to user space the 13is in control on that CPU; when a CPU returns to user space the
14specialized stacks contain no useful data. The main cpu stacks is 14specialized stacks contain no useful data. The main CPU stacks are:
15 15
16* Interrupt stack. IRQSTACKSIZE 16* Interrupt stack. IRQSTACKSIZE
17 17
@@ -32,17 +32,17 @@ x86_64 also has a feature which is not available on i386, the ability
32to automatically switch to a new stack for designated events such as 32to automatically switch to a new stack for designated events such as
33double fault or NMI, which makes it easier to handle these unusual 33double fault or NMI, which makes it easier to handle these unusual
34events on x86_64. This feature is called the Interrupt Stack Table 34events on x86_64. This feature is called the Interrupt Stack Table
35(IST). There can be up to 7 IST entries per cpu. The IST code is an 35(IST). There can be up to 7 IST entries per CPU. The IST code is an
36index into the Task State Segment (TSS), the IST entries in the TSS 36index into the Task State Segment (TSS). The IST entries in the TSS
37point to dedicated stacks, each stack can be a different size. 37point to dedicated stacks; each stack can be a different size.
38 38
39An IST is selected by an non-zero value in the IST field of an 39An IST is selected by a non-zero value in the IST field of an
40interrupt-gate descriptor. When an interrupt occurs and the hardware 40interrupt-gate descriptor. When an interrupt occurs and the hardware
41loads such a descriptor, the hardware automatically sets the new stack 41loads such a descriptor, the hardware automatically sets the new stack
42pointer based on the IST value, then invokes the interrupt handler. If 42pointer based on the IST value, then invokes the interrupt handler. If
43software wants to allow nested IST interrupts then the handler must 43software wants to allow nested IST interrupts then the handler must
44adjust the IST values on entry to and exit from the interrupt handler. 44adjust the IST values on entry to and exit from the interrupt handler.
45(this is occasionally done, e.g. for debug exceptions) 45(This is occasionally done, e.g. for debug exceptions.)
46 46
47Events with different IST codes (i.e. with different stacks) can be 47Events with different IST codes (i.e. with different stacks) can be
48nested. For example, a debug interrupt can safely be interrupted by an 48nested. For example, a debug interrupt can safely be interrupted by an
@@ -58,17 +58,17 @@ The currently assigned IST stacks are :-
58 58
59 Used for interrupt 12 - Stack Fault Exception (#SS). 59 Used for interrupt 12 - Stack Fault Exception (#SS).
60 60
61 This allows to recover from invalid stack segments. Rarely 61 This allows the CPU to recover from invalid stack segments. Rarely
62 happens. 62 happens.
63 63
64* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). 64* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
65 65
66 Used for interrupt 8 - Double Fault Exception (#DF). 66 Used for interrupt 8 - Double Fault Exception (#DF).
67 67
68 Invoked when handling a exception causes another exception. Happens 68 Invoked when handling one exception causes another exception. Happens
69 when the kernel is very confused (e.g. kernel stack pointer corrupt) 69 when the kernel is very confused (e.g. kernel stack pointer corrupt).
70 Using a separate stack allows to recover from it well enough in many 70 Using a separate stack allows the kernel to recover from it well enough
71 cases to still output an oops. 71 in many cases to still output an oops.
72 72
73* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE). 73* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
74 74
diff --git a/Documentation/x86_64/machinecheck b/Documentation/x86_64/machinecheck
new file mode 100644
index 000000000000..068a6d9904b9
--- /dev/null
+++ b/Documentation/x86_64/machinecheck
@@ -0,0 +1,70 @@
1
2Configurable sysfs parameters for the x86-64 machine check code.
3
4Machine checks report internal hardware error conditions detected
5by the CPU. Uncorrected errors typically cause a machine check
6(often with panic), corrected ones cause a machine check log entry.
7
8Machine checks are organized in banks (normally associated with
9a hardware subsystem) and subevents in a bank. The exact meaning
10of the banks and subevent is CPU specific.
11
12mcelog knows how to decode them.
13
14When you see the "Machine check errors logged" message in the system
15log then mcelog should run to collect and decode machine check entries
16from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.
17
18Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
19(N = CPU number)
20
21The directory contains some configurable entries:
22
23Entries:
24
25bankNctl
26(N bank number)
27 64bit Hex bitmask enabling/disabling specific subevents for bank N
28 When a bit in the bitmask is zero then the respective
29 subevent will not be reported.
30 By default all events are enabled.
31 Note that BIOS maintain another mask to disable specific events
32 per bank. This is not visible here
33
34The following entries appear for each CPU, but they are truly shared
35between all CPUs.
36
37check_interval
38 How often to poll for corrected machine check errors, in seconds
39 (Note output is hexademical). Default 5 minutes.
40
41tolerant
42 Tolerance level. When a machine check exception occurs for a non
43 corrected machine check the kernel can take different actions.
44 Since machine check exceptions can happen any time it is sometimes
45 risky for the kernel to kill a process because it defies
46 normal kernel locking rules. The tolerance level configures
47 how hard the kernel tries to recover even at some risk of deadlock.
48
49 0: always panic,
50 1: panic if deadlock possible,
51 2: try to avoid panic,
52 3: never panic or exit (for testing only)
53
54 Default: 1
55
56 Note this only makes a difference if the CPU allows recovery
57 from a machine check exception. Current x86 CPUs generally do not.
58
59trigger
60 Program to run when a machine check event is detected.
61 This is an alternative to running mcelog regularly from cron
62 and allows to detect events faster.
63
64TBD document entries for AMD threshold interrupt configuration
65
66For more details about the x86 machine check architecture
67see the Intel and AMD architecture manuals from their developer websites.
68
69For more details about the architecture see
70see http://one.firstfloor.org/~andi/mce.pdf
diff --git a/Documentation/x86_64/mm.txt b/Documentation/x86_64/mm.txt
index 133561b9cb0c..f42798ed1c54 100644
--- a/Documentation/x86_64/mm.txt
+++ b/Documentation/x86_64/mm.txt
@@ -3,26 +3,26 @@
3 3
4Virtual memory map with 4 level page tables: 4Virtual memory map with 4 level page tables:
5 5
60000000000000000 - 00007fffffffffff (=47bits) user space, different per mm 60000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
7hole caused by [48:63] sign extension 7hole caused by [48:63] sign extension
8ffff800000000000 - ffff80ffffffffff (=40bits) guard hole 8ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
9ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory 9ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory
10ffffc10000000000 - ffffc1ffffffffff (=40bits) hole 10ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
11ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space 11ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
12... unused hole ... 12... unused hole ...
13ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from phys 0 13ffffffff80000000 - ffffffff82800000 (=40 MB) kernel text mapping, from phys 0
14... unused hole ... 14... unused hole ...
15ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space 15ffffffff88000000 - fffffffffff00000 (=1919 MB) module mapping space
16 16
17The direct mapping covers all memory in the system upto the highest 17The direct mapping covers all memory in the system up to the highest
18memory address (this means in some cases it can also include PCI memory 18memory address (this means in some cases it can also include PCI memory
19holes) 19holes).
20 20
21vmalloc space is lazily synchronized into the different PML4 pages of 21vmalloc space is lazily synchronized into the different PML4 pages of
22the processes using the page fault handler, with init_level4_pgt as 22the processes using the page fault handler, with init_level4_pgt as
23reference. 23reference.
24 24
25Current X86-64 implementations only support 40 bit of address space, 25Current X86-64 implementations only support 40 bits of address space,
26but we support upto 46bits. This expands into MBZ space in the page tables. 26but we support up to 46 bits. This expands into MBZ space in the page tables.
27 27
28-Andi Kleen, Jul 2004 28-Andi Kleen, Jul 2004