Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits) [PATCH] x86-64: Remove mk_pte_phys() [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386 [PATCH] i386: fix 32-bit ioctls on x64_32 [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64 [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header. [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c [PATCH] i386: Move mce_disabled to asm/mce.h [PATCH] i386: paravirt unhandled fallthrough [PATCH] x86_64: Wire up compat epoll_pwait [PATCH] x86: Don't require the vDSO for handling a.out signals [PATCH] i386: Fix Cyrix MediaGX detection [PATCH] i386: Fix warning in cpu initialization [PATCH] i386: Fix warning in microcode.c [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo [PATCH] i386: Remove fastcall in paravirt.[ch] [PATCH] x86-64: Fix wrong gcc check in bitops.h [PATCH] x86-64: survive having no irq mapping for a vector [PATCH] i386: geode configuration fixes [PATCH] i386: add option to show more code in oops reports ...
author: Linus Torvalds <torvalds@woody.linux-foundation.org> 2007-02-14 12:46:06 -0500
committer: Linus Torvalds <torvalds@woody.linux-foundation.org> 2007-02-14 12:46:06 -0500
commit: 414f827c46973ba39320cfb43feb55a0eeb9b4e8 (patch)
tree: 45e860974ef698e71370a0ebdddcff4f14fbdf9e /Documentation
parent: 86a71dbd3e81e8870d0f0e56b87875f57e58222b (diff)
parent: 126b1922367fbe5513daa675a2abd13ed3917f4e (diff)
6 files changed, 186 insertions, 74 deletions
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index d25acd51e181..22b19962a1a2 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -104,6 +104,9 @@ loader, and have no meaning to the kernel directly.
 Do not modify the syntax of boot loader parameters without extreme
 need or coordination with <Documentation/i386/boot.txt>.
+There are also arch-specific kernel-parameters not documented here.
+See for example <Documentation/x86_64/boot-options.txt>.
 Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
 a trailing = on the name of any parameter states that that parameter will
 be entered as an environment variable, whereas its absence indicates that
@@ -361,6 +364,11 @@ and is between 256 and 4096 characters. It is defined in the file
                        clocksource is not available, it defaults to PIT.
                        Format: { pit | tsc | cyclone | pmtmr }
+        code_bytes      [IA32] How many bytes of object code to print in an
+                        oops report.
+                        Range: 0 - 8192
+                        Default: 64
        disable_8254_timer
        enable_8254_timer
                        [IA32/X86_64] Disable/Enable interrupt 0 timer routing
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 5c86ed6f0448..625a21db0c2a 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -180,40 +180,81 @@ PCI
  pci=lastbus=NUMBER           Scan upto NUMBER busses, no matter what the mptable says.
  pci=noacpi            Don't use ACPI to set up PCI interrupt routing.
-IOMMU
+IOMMU (input/output memory management unit)
- iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge]
+ Currently four x86-64 PCI-DMA mapping implementations exist:
-         [,forcesac][,fullflush][,nomerge][,noaperture][,calgary]
-   size  set size of iommu (in bytes)
+   1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
-   noagp don't initialize the AGP driver and use full aperture.
+      (e.g. because you have < 3 GB memory).
-   off   don't use the IOMMU
+      Kernel boot message: "PCI-DMA: Disabling IOMMU"
-   leak  turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on)
-   memaper[=order] allocate an own aperture over RAM with size 32MB^order.
+   2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU.
-   noforce don't force IOMMU usage. Default.
+      Kernel boot message: "PCI-DMA: using GART IOMMU"
-   force  Force IOMMU.
-   merge  Do SG merging. Implies force (experimental)
+   3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
-   nomerge Don't do SG merging.
+      e.g. if there is no hardware IOMMU in the system and it is need because
-   forcesac For SAC mode for masks <40bits  (experimental)
+      you have >3GB memory or told the kernel to us it (iommu=soft))
-   fullflush Flush IOMMU on each allocation (default)
+      Kernel boot message: "PCI-DMA: Using software bounce buffering
-   nofullflush Don't use IOMMU fullflush
+      for IO (SWIOTLB)"
-   allowed  overwrite iommu off workarounds for specific chipsets.
-   soft  Use software bounce buffering (default for Intel machines)
+   4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
-   noaperture Don't touch the aperture for AGP.
+      pSeries and xSeries servers. This hardware IOMMU supports DMA address
-   allowdac Allow DMA >4GB
+      mapping with memory protection, etc.
-            When off all DMA over >4GB is forced through an IOMMU or bounce
+      Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
-            buffering.
-   nodac    Forbid DMA >4GB
+ iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
-   panic    Always panic when IOMMU overflows
+        [,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
-   calgary  Use the Calgary IOMMU if it is available
+        [,noaperture][,calgary]
-  swiotlb=pages[,force]
+  General iommu options:
+    off                Don't initialize and use any kind of IOMMU.
-  pages  Prereserve that many 128K pages for the software IO bounce buffering.
+    noforce            Don't force hardware IOMMU usage when it is not needed.
-  force  Force all IO through the software TLB.
+                       (default).
+    force              Force the use of the hardware IOMMU even when it is
-  calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
+                       not actually needed (e.g. because < 3 GB memory).
-  calgary=[translate_empty_slots]
+    soft               Use software bounce buffering (SWIOTLB) (default for
-  calgary=[disable=<PCI bus number>]
+                       Intel machines). This can be used to prevent the usage
+                       of an available hardware IOMMU.
+  iommu options only relevant to the AMD GART hardware IOMMU:
+    <size>             Set the size of the remapping area in bytes.
+    allowed            Overwrite iommu off workarounds for specific chipsets.
+    fullflush          Flush IOMMU on each allocation (default).
+    nofullflush        Don't use IOMMU fullflush.
+    leak               Turn on simple iommu leak tracing (only when
+                       CONFIG_IOMMU_LEAK is on). Default number of leak pages
+                       is 20.
+    memaper[=<order>]  Allocate an own aperture over RAM with size 32MB<<order.
+                       (default: order=1, i.e. 64MB)
+    merge              Do scatter-gather (SG) merging. Implies "force"
+                       (experimental).
+    nomerge            Don't do scatter-gather (SG) merging.
+    noaperture         Ask the IOMMU not to touch the aperture for AGP.
+    forcesac           Force single-address cycle (SAC) mode for masks <40bits
+                       (experimental).
+    noagp              Don't initialize the AGP driver and use full aperture.
+    allowdac           Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
+                       DAC is used with 32-bit PCI to push a 64-bit address in
+                       two cycles. When off all DMA over >4GB is forced through
+                       an IOMMU or software bounce buffering.
+    nodac              Forbid DAC mode, i.e. DMA >4GB.
+    panic              Always panic when IOMMU overflows.
+    calgary            Use the Calgary IOMMU if it is available
+  iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
+  implementation:
+    swiotlb=<pages>[,force]
+    <pages>            Prereserve that many 128K pages for the software IO
+                       bounce buffering.
+    force              Force all IO through the software TLB.
+  Settings for the IBM Calgary hardware IOMMU currently found in IBM
+  pSeries and xSeries machines:
+    calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
+    calgary=[translate_empty_slots]
+    calgary=[disable=<PCI bus number>]
+    panic              Always panic when IOMMU overflows
    64k,...,8M - Set the size of each PCI slot's translation table
    when using the Calgary IOMMU. This is the size of the translation
@@ -234,14 +275,14 @@ IOMMU
 Debugging
-  oops=panic Always panic on oopses. Default is to just kill the process,
+  oops=panic    Always panic on oopses. Default is to just kill the process,
-             but there is a small probability of deadlocking the machine.
+                but there is a small probability of deadlocking the machine.
-             This will also cause panics on machine check exceptions.
+                This will also cause panics on machine check exceptions.
-             Useful together with panic=30 to trigger a reboot.
+                Useful together with panic=30 to trigger a reboot.
-  kstack=N   Print that many words from the kernel stack in oops dumps.
+  kstack=N      Print N words from the kernel stack in oops dumps.
-  pagefaulttrace Dump all page faults. Only useful for extreme debugging
+  pagefaulttrace  Dump all page faults. Only useful for extreme debugging
                and will create a lot of output.
  call_trace=[old|both|newfallback|new]
@@ -251,15 +292,8 @@ Debugging
                newfallback: use new unwinder but fall back to old if it gets
                        stuck (default)
-  call_trace=[old|both|newfallback|new]
+Miscellaneous
-                old: use old inexact backtracer
-                new: use new exact dwarf2 unwinder
-                both: print entries from both
-                newfallback: use new unwinder but fall back to old if it gets
-                        stuck (default)
-Misc
  noreplacement  Don't replace instructions with more appropriate ones
                 for the CPU. This may be useful on asymmetric MP systems
-                 where some CPU have less capabilities than the others.
+                 where some CPUs have less capabilities than others.
diff --git a/Documentation/x86_64/cpu-hotplug-spec b/Documentation/x86_64/cpu-hotplug-spec
index 5c0fa345e556..3c23e0587db3 100644
--- a/Documentation/x86_64/cpu-hotplug-spec
+++ b/Documentation/x86_64/cpu-hotplug-spec
@@ -2,7 +2,7 @@ Firmware support for CPU hotplug under Linux/x86-64
 ---------------------------------------------------
 Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
-know in advance boot time the maximum number of CPUs that could be plugged
+know in advance of boot time the maximum number of CPUs that could be plugged
 into the system. ACPI 3.0 currently has no official way to supply
 this information from the firmware to the operating system.
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks
index bddfddd466ab..5ad65d51fb95 100644
--- a/Documentation/x86_64/kernel-stacks
+++ b/Documentation/x86_64/kernel-stacks
@@ -9,9 +9,9 @@ zombie. While the thread is in user space the kernel stack is empty
 except for the thread_info structure at the bottom.
 In addition to the per thread stacks, there are specialized stacks
-associated with each cpu.  These stacks are only used while the kernel
+associated with each CPU.  These stacks are only used while the kernel
-is in control on that cpu, when a cpu returns to user space the
+is in control on that CPU; when a CPU returns to user space the
-specialized stacks contain no useful data.  The main cpu stacks is
+specialized stacks contain no useful data.  The main CPU stacks are:
 * Interrupt stack.  IRQSTACKSIZE
@@ -32,17 +32,17 @@ x86_64 also has a feature which is not available on i386, the ability
 to automatically switch to a new stack for designated events such as
 double fault or NMI, which makes it easier to handle these unusual
 events on x86_64.  This feature is called the Interrupt Stack Table
-(IST).  There can be up to 7 IST entries per cpu. The IST code is an
+(IST).  There can be up to 7 IST entries per CPU. The IST code is an
-index into the Task State Segment (TSS), the IST entries in the TSS
+index into the Task State Segment (TSS). The IST entries in the TSS
-point to dedicated stacks, each stack can be a different size.
+point to dedicated stacks; each stack can be a different size.
-An IST is selected by an non-zero value in the IST field of an
+An IST is selected by a non-zero value in the IST field of an
 interrupt-gate descriptor.  When an interrupt occurs and the hardware
 loads such a descriptor, the hardware automatically sets the new stack
 pointer based on the IST value, then invokes the interrupt handler.  If
 software wants to allow nested IST interrupts then the handler must
 adjust the IST values on entry to and exit from the interrupt handler.
-(this is occasionally done, e.g. for debug exceptions)
+(This is occasionally done, e.g. for debug exceptions.)
 Events with different IST codes (i.e. with different stacks) can be
 nested.  For example, a debug interrupt can safely be interrupted by an
@@ -58,17 +58,17 @@ The currently assigned IST stacks are :-
  Used for interrupt 12 - Stack Fault Exception (#SS).
-  This allows to recover from invalid stack segments. Rarely
+  This allows the CPU to recover from invalid stack segments. Rarely
  happens.
 * DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
  Used for interrupt 8 - Double Fault Exception (#DF).
-  Invoked when handling a exception causes another exception. Happens
+  Invoked when handling one exception causes another exception. Happens
-  when the kernel is very confused (e.g. kernel stack pointer corrupt)
+  when the kernel is very confused (e.g. kernel stack pointer corrupt).
-  Using a separate stack allows to recover from it well enough in many
+  Using a separate stack allows the kernel to recover from it well enough
-  cases to still output an oops.
+  in many cases to still output an oops.
 * NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
diff --git a/Documentation/x86_64/machinecheck b/Documentation/x86_64/machinecheck
new file mode 100644
index 000000000000..068a6d9904b9
--- /dev/null
+++ b/Documentation/x86_64/machinecheck
@@ -0,0 +1,70 @@
+Configurable sysfs parameters for the x86-64 machine check code.
+Machine checks report internal hardware error conditions detected
+by the CPU. Uncorrected errors typically cause a machine check
+(often with panic), corrected ones cause a machine check log entry.
+Machine checks are organized in banks (normally associated with
+a hardware subsystem) and subevents in a bank. The exact meaning
+of the banks and subevent is CPU specific.
+mcelog knows how to decode them.
+When you see the "Machine check errors logged" message in the system
+log then mcelog should run to collect and decode machine check entries
+from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.
+Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
+(N = CPU number)
+The directory contains some configurable entries:
+Entries:
+bankNctl
+(N bank number)
+        64bit Hex bitmask enabling/disabling specific subevents for bank N
+        When a bit in the bitmask is zero then the respective
+        subevent will not be reported.
+        By default all events are enabled.
+        Note that BIOS maintain another mask to disable specific events
+        per bank.  This is not visible here
+The following entries appear for each CPU, but they are truly shared
+between all CPUs.
+check_interval
+        How often to poll for corrected machine check errors, in seconds
+        (Note output is hexademical). Default 5 minutes.
+tolerant
+        Tolerance level. When a machine check exception occurs for a non
+        corrected machine check the kernel can take different actions.
+        Since machine check exceptions can happen any time it is sometimes
+        risky for the kernel to kill a process because it defies
+        normal kernel locking rules. The tolerance level configures
+        how hard the kernel tries to recover even at some risk of deadlock.
+        0: always panic,
+        1: panic if deadlock possible,
+        2: try to avoid panic,
+        3: never panic or exit (for testing only)
+        Default: 1
+        Note this only makes a difference if the CPU allows recovery
+        from a machine check exception. Current x86 CPUs generally do not.
+trigger
+        Program to run when a machine check event is detected.
+        This is an alternative to running mcelog regularly from cron
+        and allows to detect events faster.
+TBD document entries for AMD threshold interrupt configuration
+For more details about the x86 machine check architecture
+see the Intel and AMD architecture manuals from their developer websites.
+For more details about the architecture see
+see http://one.firstfloor.org/~andi/mce.pdf
diff --git a/Documentation/x86_64/mm.txt b/Documentation/x86_64/mm.txt
index 133561b9cb0c..f42798ed1c54 100644
--- a/Documentation/x86_64/mm.txt
+++ b/Documentation/x86_64/mm.txt
@@ -3,26 +3,26 @@
 Virtual memory map with 4 level page tables:
-0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm
+0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
 hole caused by [48:63] sign extension
-ffff800000000000 - ffff80ffffffffff (=40bits) guard hole
+ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
-ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory
+ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory
-ffffc10000000000 - ffffc1ffffffffff (=40bits) hole
+ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
-ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space
+ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
 ... unused hole ...
-ffffffff80000000 - ffffffff82800000 (=40MB)   kernel text mapping, from phys 0
+ffffffff80000000 - ffffffff82800000 (=40 MB)   kernel text mapping, from phys 0
 ... unused hole ...
-ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space
+ffffffff88000000 - fffffffffff00000 (=1919 MB) module mapping space
-The direct mapping covers all memory in the system upto the highest
+The direct mapping covers all memory in the system up to the highest
 memory address (this means in some cases it can also include PCI memory
-holes)
+holes).
 vmalloc space is lazily synchronized into the different PML4 pages of
 the processes using the page fault handler, with init_level4_pgt as
 reference.
-Current X86-64 implementations only support 40 bit of address space,
+Current X86-64 implementations only support 40 bits of address space,
-but we support upto 46bits. This expands into MBZ space in the page tables.
+but we support up to 46 bits. This expands into MBZ space in the page tables.
 -Andi Kleen, Jul 2004
author	Linus Torvalds <torvalds@woody.linux-foundation.org>	2007-02-14 12:46:06 -0500
committer	Linus Torvalds <torvalds@woody.linux-foundation.org>	2007-02-14 12:46:06 -0500
commit	414f827c46973ba39320cfb43feb55a0eeb9b4e8 (patch)
tree	45e860974ef698e71370a0ebdddcff4f14fbdf9e /Documentation
parent	86a71dbd3e81e8870d0f0e56b87875f57e58222b (diff)
parent	126b1922367fbe5513daa675a2abd13ed3917f4e (diff)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index d25acd51e181..22b19962a1a2 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt
@@ -104,6 +104,9 @@ loader, and have no meaning to the kernel directly.
104	Do not modify the syntax of boot loader parameters without extreme	104	Do not modify the syntax of boot loader parameters without extreme
105	need or coordination with <Documentation/i386/boot.txt>.	105	need or coordination with <Documentation/i386/boot.txt>.
106		106
		107	There are also arch-specific kernel-parameters not documented here.
		108	See for example <Documentation/x86_64/boot-options.txt>.
		109
107	Note that ALL kernel parameters listed below are CASE SENSITIVE, and that	110	Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
108	a trailing = on the name of any parameter states that that parameter will	111	a trailing = on the name of any parameter states that that parameter will
109	be entered as an environment variable, whereas its absence indicates that	112	be entered as an environment variable, whereas its absence indicates that
@@ -361,6 +364,11 @@ and is between 256 and 4096 characters. It is defined in the file
361	clocksource is not available, it defaults to PIT.	364	clocksource is not available, it defaults to PIT.
362	Format: { pit \| tsc \| cyclone \| pmtmr }	365	Format: { pit \| tsc \| cyclone \| pmtmr }
363		366
		367	code_bytes [IA32] How many bytes of object code to print in an
		368	oops report.
		369	Range: 0 - 8192
		370	Default: 64
		371
364	disable_8254_timer	372	disable_8254_timer
365	enable_8254_timer	373	enable_8254_timer
366	[IA32/X86_64] Disable/Enable interrupt 0 timer routing	374	[IA32/X86_64] Disable/Enable interrupt 0 timer routing


diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 5c86ed6f0448..625a21db0c2a 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt
@@ -180,40 +180,81 @@ PCI
180	pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says.	180	pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says.
181	pci=noacpi Don't use ACPI to set up PCI interrupt routing.	181	pci=noacpi Don't use ACPI to set up PCI interrupt routing.
182		182
183	IOMMU	183	IOMMU (input/output memory management unit)
184		184
185	iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge]	185	Currently four x86-64 PCI-DMA mapping implementations exist:
186	[,forcesac][,fullflush][,nomerge][,noaperture][,calgary]	186
187	size set size of iommu (in bytes)	187	1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
188	noagp don't initialize the AGP driver and use full aperture.	188	(e.g. because you have < 3 GB memory).
189	off don't use the IOMMU	189	Kernel boot message: "PCI-DMA: Disabling IOMMU"
190	leak turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on)	190
191	memaper[=order] allocate an own aperture over RAM with size 32MB^order.	191	2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU.
192	noforce don't force IOMMU usage. Default.	192	Kernel boot message: "PCI-DMA: using GART IOMMU"
193	force Force IOMMU.	193
194	merge Do SG merging. Implies force (experimental)	194	3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
195	nomerge Don't do SG merging.	195	e.g. if there is no hardware IOMMU in the system and it is need because
196	forcesac For SAC mode for masks <40bits (experimental)	196	you have >3GB memory or told the kernel to us it (iommu=soft))
197	fullflush Flush IOMMU on each allocation (default)	197	Kernel boot message: "PCI-DMA: Using software bounce buffering
198	nofullflush Don't use IOMMU fullflush	198	for IO (SWIOTLB)"
199	allowed overwrite iommu off workarounds for specific chipsets.	199
200	soft Use software bounce buffering (default for Intel machines)	200	4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
201	noaperture Don't touch the aperture for AGP.	201	pSeries and xSeries servers. This hardware IOMMU supports DMA address
202	allowdac Allow DMA >4GB	202	mapping with memory protection, etc.
203	When off all DMA over >4GB is forced through an IOMMU or bounce	203	Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
204	buffering.	204
205	nodac Forbid DMA >4GB	205	iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
206	panic Always panic when IOMMU overflows	206	[,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
207	calgary Use the Calgary IOMMU if it is available	207	[,noaperture][,calgary]
208		208
209	swiotlb=pages[,force]	209	General iommu options:
210		210	off Don't initialize and use any kind of IOMMU.
211	pages Prereserve that many 128K pages for the software IO bounce buffering.	211	noforce Don't force hardware IOMMU usage when it is not needed.
212	force Force all IO through the software TLB.	212	(default).
213		213	force Force the use of the hardware IOMMU even when it is
214	calgary=[64k,128k,256k,512k,1M,2M,4M,8M]	214	not actually needed (e.g. because < 3 GB memory).
215	calgary=[translate_empty_slots]	215	soft Use software bounce buffering (SWIOTLB) (default for
216	calgary=[disable=<PCI bus number>]	216	Intel machines). This can be used to prevent the usage
		217	of an available hardware IOMMU.
		218
		219	iommu options only relevant to the AMD GART hardware IOMMU:
		220	<size> Set the size of the remapping area in bytes.
		221	allowed Overwrite iommu off workarounds for specific chipsets.
		222	fullflush Flush IOMMU on each allocation (default).
		223	nofullflush Don't use IOMMU fullflush.
		224	leak Turn on simple iommu leak tracing (only when
		225	CONFIG_IOMMU_LEAK is on). Default number of leak pages
		226	is 20.
		227	memaper[=<order>] Allocate an own aperture over RAM with size 32MB<<order.
		228	(default: order=1, i.e. 64MB)
		229	merge Do scatter-gather (SG) merging. Implies "force"
		230	(experimental).
		231	nomerge Don't do scatter-gather (SG) merging.
		232	noaperture Ask the IOMMU not to touch the aperture for AGP.
		233	forcesac Force single-address cycle (SAC) mode for masks <40bits
		234	(experimental).
		235	noagp Don't initialize the AGP driver and use full aperture.
		236	allowdac Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
		237	DAC is used with 32-bit PCI to push a 64-bit address in
		238	two cycles. When off all DMA over >4GB is forced through
		239	an IOMMU or software bounce buffering.
		240	nodac Forbid DAC mode, i.e. DMA >4GB.
		241	panic Always panic when IOMMU overflows.
		242	calgary Use the Calgary IOMMU if it is available
		243
		244	iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
		245	implementation:
		246	swiotlb=<pages>[,force]
		247	<pages> Prereserve that many 128K pages for the software IO
		248	bounce buffering.
		249	force Force all IO through the software TLB.
		250
		251	Settings for the IBM Calgary hardware IOMMU currently found in IBM
		252	pSeries and xSeries machines:
		253
		254	calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
		255	calgary=[translate_empty_slots]
		256	calgary=[disable=<PCI bus number>]
		257	panic Always panic when IOMMU overflows
217		258
218	64k,...,8M - Set the size of each PCI slot's translation table	259	64k,...,8M - Set the size of each PCI slot's translation table
219	when using the Calgary IOMMU. This is the size of the translation	260	when using the Calgary IOMMU. This is the size of the translation
@@ -234,14 +275,14 @@ IOMMU
234		275
235	Debugging	276	Debugging
236		277
237	oops=panic Always panic on oopses. Default is to just kill the process,	278	oops=panic Always panic on oopses. Default is to just kill the process,
238	but there is a small probability of deadlocking the machine.	279	but there is a small probability of deadlocking the machine.
239	This will also cause panics on machine check exceptions.	280	This will also cause panics on machine check exceptions.
240	Useful together with panic=30 to trigger a reboot.	281	Useful together with panic=30 to trigger a reboot.
241		282
242	kstack=N Print that many words from the kernel stack in oops dumps.	283	kstack=N Print N words from the kernel stack in oops dumps.
243		284
244	pagefaulttrace Dump all page faults. Only useful for extreme debugging	285	pagefaulttrace Dump all page faults. Only useful for extreme debugging
245	and will create a lot of output.	286	and will create a lot of output.
246		287
247	call_trace=[old\|both\|newfallback\|new]	288	call_trace=[old\|both\|newfallback\|new]
@@ -251,15 +292,8 @@ Debugging
251	newfallback: use new unwinder but fall back to old if it gets	292	newfallback: use new unwinder but fall back to old if it gets
252	stuck (default)	293	stuck (default)
253		294
254	call_trace=[old\|both\|newfallback\|new]	295	Miscellaneous
255	old: use old inexact backtracer
256	new: use new exact dwarf2 unwinder
257	both: print entries from both
258	newfallback: use new unwinder but fall back to old if it gets
259	stuck (default)
260
261	Misc
262		296
263	noreplacement Don't replace instructions with more appropriate ones	297	noreplacement Don't replace instructions with more appropriate ones
264	for the CPU. This may be useful on asymmetric MP systems	298	for the CPU. This may be useful on asymmetric MP systems
265	where some CPU have less capabilities than the others.	299	where some CPUs have less capabilities than others.


diff --git a/Documentation/x86_64/cpu-hotplug-spec b/Documentation/x86_64/cpu-hotplug-spec index 5c0fa345e556..3c23e0587db3 100644 --- a/Documentation/x86_64/cpu-hotplug-spec +++ b/Documentation/x86_64/cpu-hotplug-spec
@@ -2,7 +2,7 @@ Firmware support for CPU hotplug under Linux/x86-64
2	---------------------------------------------------	2	---------------------------------------------------
3		3
4	Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to	4	Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
5	know in advance boot time the maximum number of CPUs that could be plugged	5	know in advance of boot time the maximum number of CPUs that could be plugged
6	into the system. ACPI 3.0 currently has no official way to supply	6	into the system. ACPI 3.0 currently has no official way to supply
7	this information from the firmware to the operating system.	7	this information from the firmware to the operating system.
8		8


diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks index bddfddd466ab..5ad65d51fb95 100644 --- a/Documentation/x86_64/kernel-stacks +++ b/Documentation/x86_64/kernel-stacks
@@ -9,9 +9,9 @@ zombie. While the thread is in user space the kernel stack is empty
9	except for the thread_info structure at the bottom.	9	except for the thread_info structure at the bottom.
10		10
11	In addition to the per thread stacks, there are specialized stacks	11	In addition to the per thread stacks, there are specialized stacks
12	associated with each cpu. These stacks are only used while the kernel	12	associated with each CPU. These stacks are only used while the kernel
13	is in control on that cpu, when a cpu returns to user space the	13	is in control on that CPU; when a CPU returns to user space the
14	specialized stacks contain no useful data. The main cpu stacks is	14	specialized stacks contain no useful data. The main CPU stacks are:
15		15
16	* Interrupt stack. IRQSTACKSIZE	16	* Interrupt stack. IRQSTACKSIZE
17		17
@@ -32,17 +32,17 @@ x86_64 also has a feature which is not available on i386, the ability
32	to automatically switch to a new stack for designated events such as	32	to automatically switch to a new stack for designated events such as
33	double fault or NMI, which makes it easier to handle these unusual	33	double fault or NMI, which makes it easier to handle these unusual
34	events on x86_64. This feature is called the Interrupt Stack Table	34	events on x86_64. This feature is called the Interrupt Stack Table
35	(IST). There can be up to 7 IST entries per cpu. The IST code is an	35	(IST). There can be up to 7 IST entries per CPU. The IST code is an
36	index into the Task State Segment (TSS), the IST entries in the TSS	36	index into the Task State Segment (TSS). The IST entries in the TSS
37	point to dedicated stacks, each stack can be a different size.	37	point to dedicated stacks; each stack can be a different size.
38		38
39	An IST is selected by an non-zero value in the IST field of an	39	An IST is selected by a non-zero value in the IST field of an
40	interrupt-gate descriptor. When an interrupt occurs and the hardware	40	interrupt-gate descriptor. When an interrupt occurs and the hardware
41	loads such a descriptor, the hardware automatically sets the new stack	41	loads such a descriptor, the hardware automatically sets the new stack
42	pointer based on the IST value, then invokes the interrupt handler. If	42	pointer based on the IST value, then invokes the interrupt handler. If
43	software wants to allow nested IST interrupts then the handler must	43	software wants to allow nested IST interrupts then the handler must
44	adjust the IST values on entry to and exit from the interrupt handler.	44	adjust the IST values on entry to and exit from the interrupt handler.
45	(this is occasionally done, e.g. for debug exceptions)	45	(This is occasionally done, e.g. for debug exceptions.)
46		46
47	Events with different IST codes (i.e. with different stacks) can be	47	Events with different IST codes (i.e. with different stacks) can be
48	nested. For example, a debug interrupt can safely be interrupted by an	48	nested. For example, a debug interrupt can safely be interrupted by an
@@ -58,17 +58,17 @@ The currently assigned IST stacks are :-
58		58
59	Used for interrupt 12 - Stack Fault Exception (#SS).	59	Used for interrupt 12 - Stack Fault Exception (#SS).
60		60
61	This allows to recover from invalid stack segments. Rarely	61	This allows the CPU to recover from invalid stack segments. Rarely
62	happens.	62	happens.
63		63
64	* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).	64	* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
65		65
66	Used for interrupt 8 - Double Fault Exception (#DF).	66	Used for interrupt 8 - Double Fault Exception (#DF).
67		67
68	Invoked when handling a exception causes another exception. Happens	68	Invoked when handling one exception causes another exception. Happens
69	when the kernel is very confused (e.g. kernel stack pointer corrupt)	69	when the kernel is very confused (e.g. kernel stack pointer corrupt).
70	Using a separate stack allows to recover from it well enough in many	70	Using a separate stack allows the kernel to recover from it well enough
71	cases to still output an oops.	71	in many cases to still output an oops.
72		72
73	* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).	73	* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
74		74


diff --git a/Documentation/x86_64/machinecheck b/Documentation/x86_64/machinecheck new file mode 100644 index 000000000000..068a6d9904b9 --- /dev/null +++ b/Documentation/x86_64/machinecheck
@@ -0,0 +1,70 @@
		1
		2	Configurable sysfs parameters for the x86-64 machine check code.
		3
		4	Machine checks report internal hardware error conditions detected
		5	by the CPU. Uncorrected errors typically cause a machine check
		6	(often with panic), corrected ones cause a machine check log entry.
		7
		8	Machine checks are organized in banks (normally associated with
		9	a hardware subsystem) and subevents in a bank. The exact meaning
		10	of the banks and subevent is CPU specific.
		11
		12	mcelog knows how to decode them.
		13
		14	When you see the "Machine check errors logged" message in the system
		15	log then mcelog should run to collect and decode machine check entries
		16	from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.
		17
		18	Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
		19	(N = CPU number)
		20
		21	The directory contains some configurable entries:
		22
		23	Entries:
		24
		25	bankNctl
		26	(N bank number)
		27	64bit Hex bitmask enabling/disabling specific subevents for bank N
		28	When a bit in the bitmask is zero then the respective
		29	subevent will not be reported.
		30	By default all events are enabled.
		31	Note that BIOS maintain another mask to disable specific events
		32	per bank. This is not visible here
		33
		34	The following entries appear for each CPU, but they are truly shared
		35	between all CPUs.
		36
		37	check_interval
		38	How often to poll for corrected machine check errors, in seconds
		39	(Note output is hexademical). Default 5 minutes.
		40
		41	tolerant
		42	Tolerance level. When a machine check exception occurs for a non
		43	corrected machine check the kernel can take different actions.
		44	Since machine check exceptions can happen any time it is sometimes
		45	risky for the kernel to kill a process because it defies
		46	normal kernel locking rules. The tolerance level configures
		47	how hard the kernel tries to recover even at some risk of deadlock.
		48
		49	0: always panic,
		50	1: panic if deadlock possible,
		51	2: try to avoid panic,
		52	3: never panic or exit (for testing only)
		53
		54	Default: 1
		55
		56	Note this only makes a difference if the CPU allows recovery
		57	from a machine check exception. Current x86 CPUs generally do not.
		58
		59	trigger
		60	Program to run when a machine check event is detected.
		61	This is an alternative to running mcelog regularly from cron
		62	and allows to detect events faster.
		63
		64	TBD document entries for AMD threshold interrupt configuration
		65
		66	For more details about the x86 machine check architecture
		67	see the Intel and AMD architecture manuals from their developer websites.
		68
		69	For more details about the architecture see
		70	see http://one.firstfloor.org/~andi/mce.pdf


diff --git a/Documentation/x86_64/mm.txt b/Documentation/x86_64/mm.txt index 133561b9cb0c..f42798ed1c54 100644 --- a/Documentation/x86_64/mm.txt +++ b/Documentation/x86_64/mm.txt
@@ -3,26 +3,26 @@
3		3
4	Virtual memory map with 4 level page tables:	4	Virtual memory map with 4 level page tables:
5		5
6	0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm	6	0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
7	hole caused by [48:63] sign extension	7	hole caused by [48:63] sign extension
8	ffff800000000000 - ffff80ffffffffff (=40bits) guard hole	8	ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
9	ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory	9	ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory
10	ffffc10000000000 - ffffc1ffffffffff (=40bits) hole	10	ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
11	ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space	11	ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
12	... unused hole ...	12	... unused hole ...
13	ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from phys 0	13	ffffffff80000000 - ffffffff82800000 (=40 MB) kernel text mapping, from phys 0
14	... unused hole ...	14	... unused hole ...
15	ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space	15	ffffffff88000000 - fffffffffff00000 (=1919 MB) module mapping space
16		16
17	The direct mapping covers all memory in the system upto the highest	17	The direct mapping covers all memory in the system up to the highest
18	memory address (this means in some cases it can also include PCI memory	18	memory address (this means in some cases it can also include PCI memory
19	holes)	19	holes).
20		20
21	vmalloc space is lazily synchronized into the different PML4 pages of	21	vmalloc space is lazily synchronized into the different PML4 pages of
22	the processes using the page fault handler, with init_level4_pgt as	22	the processes using the page fault handler, with init_level4_pgt as
23	reference.	23	reference.
24		24
25	Current X86-64 implementations only support 40 bit of address space,	25	Current X86-64 implementations only support 40 bits of address space,
26	but we support upto 46bits. This expands into MBZ space in the page tables.	26	but we support up to 46 bits. This expands into MBZ space in the page tables.
27		27
28	-Andi Kleen, Jul 2004	28	-Andi Kleen, Jul 2004