aboutsummaryrefslogtreecommitdiffstats
path: root/arch/i386
Commit message (Collapse)AuthorAge
* Revert "fbdev: ignore VESA modes if framebuffer is disabled"Linus Torvalds2007-05-08
| | | | | | | | | | | | | | This reverts commit 464bdd33e9baad9806c7adbd8dfc37081a55f27e. Peter Anvin correctly points out that VESA modes have nothing to do with frame buffers per se - they are often just regular extended text modes. Disabling them just because we don't have frame buffer support is very wrong. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Antonino A. Daplas <adaplas@gmail.com>, Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6Linus Torvalds2007-05-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (32 commits) Use menuconfig objects - hwmon hwmon/smsc47b397: Use dynamic sysfs callbacks hwmon/smsc47b397: Convert to a platform driver hwmon/w83781d: Deprecate W83627HF support hwmon/w83781d: Use dynamic sysfs callbacks hwmon/w83781d: Be less i2c_client-centric hwmon/w83781d: Clean up conversion macros hwmon/w83781d: No longer use i2c-isa hwmon/ams: Do not print error on systems without apple motion sensor hwmon/ams: Fix I2C read retry logic hwmon: New AD7416, AD7417 and AD7418 driver hwmon/coretemp: Add documentation hwmon: New coretemp driver i386: Use functions from library in msr driver i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpu hwmon/lm75: Use dynamic sysfs callbacks hwmon/lm78: Use dynamic sysfs callbacks hwmon/lm78: Be less i2c_client-centric hwmon/lm78: No longer use i2c-isa hwmon: New max6650 driver ...
| * i386: Use functions from library in msr driverNicolas Boichat2007-05-08
| | | | | | | | | | | | | | | | | | Use safe MSR functions provided by arch/*/lib/msr-on-cpu.c in arch/i386/kernel/msr.c. Signed-off-by: Nicolas Boichat <nicolas@boichat.ch> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
| * i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpuRudolf Marek2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add safe (exception handled) variants of rdmsr_on_cpu and wrmsr_on_cpu. You should use these when the target MSR may not actually exist, as doing so could trigger an exception which the regular functions do not handle. The safe variants are slower, though. The upcoming coretemp hardware monitoring driver will need this. Signed-off-by: Rudolf Marek <r.marek@assembler.cz> Cc: Alexey Dobriyan <adobriyan@openvz.org> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
* | fbdev: ignore VESA modes if framebuffer is disabledAntonino A. Daplas2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the option vga=<VESA graphics mode> is added to the boot parameter, it will activate graphics mode, but without any framebuffer support, the user is left with an unusable display. Change the behavior such that the user is instead prompted for another mode (ala vga=ask). NOTE: People can always use vbetool to set a graphics mode if this is really desired, but the number of people doing this approaches zero. Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | x86, serial: convert legacy COM ports to platform devicesBjorn Helgaas2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make x86 COM ports into platform devices and don't probe for them if we have PNP. This prevents double discovery, where a device was found both by the legacy probe and by 8250_pnp, e.g., serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A This also means IRDA devices without a UART PNP ID will no longer be claimed by the serial driver, which might require changes in IRDA drivers and administration. In addition to this patch, you may need to configure a setserial init script, e.g., /etc/init.d/setserial, so it doesn't poke legacy UART stuff back in. On Debian, "dpkg-reconfigure setserial" with the "kernel" option does this. To force the old legacy probe behavior even when we have PNPBIOS or ACPI, load the new legacy_serial module (or build 8250 static) with the "legacy_serial.force" option. [akpm@linux-foundation.org: fix makefiles] Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Keith Owens <kaos@ocs.com.au> Cc: Len Brown <lenb@kernel.org> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Matthieu CASTET <castet.matthieu@free.fr> Cc: Jean Tourrilhes <jt@hpl.hp.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Ville Syrjala <syrjala@sci.fi> Cc: Russell King <rmk+serial@arm.linux.org.uk> Cc: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Add IRQF_IRQPOLL flag on i386Bernhard Walle2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | Add IRQF_IRQPOLL to timer interrupts on i386. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Kprobes: The ON/OFF knob thru debugfsAnanth N Mavinakayanahalli2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides a debugfs knob to turn kprobes on/off o A new file /debug/kprobes/enabled indicates if kprobes is enabled or not (default enabled) o Echoing 0 to this file will disarm all installed probes o Any new probe registration when disabled will register the probe but not arm it. A message will be printed out in such a case. o When a value 1 is echoed to the file, all probes (including ones registered in the intervening period) will be enabled o Unregistration will happen irrespective of whether probes are globally enabled or not. o Update Documentation/kprobes.txt to reflect these changes. While there also update the doc to make it current. We are also looking at providing sysrq key support to tie to the disabling feature provided by this patch. [akpm@linux-foundation.org: Use bool like a bool!] [akpm@linux-foundation.org: add printk facility levels] [cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390] Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | kprobes: kretprobes simplificationsChristoph Hellwig2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - consolidate duplicate code in all arch_prepare_kretprobe instances into common code - replace various odd helpers that use hlist_for_each_entry to get the first elemenet of a list with either a hlist_for_each_entry_save or an opencoded access to the first element in the caller - inline add_rp_inst into it's only remaining caller - use kretprobe_inst_table_head instead of opencoding it Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | utimensat implementationUlrich Drepper2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement utimensat(2) which is an extension to futimesat(2) in that it a) supports nano-second resolution for the timestamps b) allows to selectively ignore the atime/mtime value c) allows to selectively use the current time for either atime or mtime d) supports changing the atime/mtime of a symlink itself along the lines of the BSD lutimes(3) functions For this change the internally used do_utimes() functions was changed to accept a timespec time value and an additional flags parameter. Additionally the sys_utime function was changed to match compat_sys_utime which already use do_utimes instead of duplicating the work. Also, the completely missing futimensat() functionality is added. We have such a function in glibc but we have to resort to using /proc/self/fd/* which not everybody likes (chroot etc). Test application (the syscall number will need per-arch editing): #include <errno.h> #include <fcntl.h> #include <time.h> #include <sys/time.h> #include <stddef.h> #include <syscall.h> #define __NR_utimensat 280 #define UTIME_NOW ((1l << 30) - 1l) #define UTIME_OMIT ((1l << 30) - 2l) int main(void) { int status = 0; int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666); if (fd == -1) error (1, errno, "failed to create test file \"ttt\""); struct stat64 st1; if (fstat64 (fd, &st1) != 0) error (1, errno, "fstat failed"); struct timespec t[2]; t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); struct stat64 st2; if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0) { puts ("atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0] = st1.st_atim; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_OMIT; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("atim not set"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("mtim changed from zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 0; t[0].tv_nsec = UTIME_OMIT; t[1] = st1.st_mtim; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("mtim changed from original time"); status = 1; } if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec) { puts ("mtim not set"); status = 1; } if (status != 0) goto out; sleep (2); t[0].tv_sec = 0; t[0].tv_nsec = UTIME_NOW; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_NOW; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); struct timeval tv; gettimeofday(&tv,NULL); if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec || st2.st_atim.tv_sec > tv.tv_sec) { puts ("atim not set to NOW"); status = 1; } if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec || st2.st_mtim.tv_sec > tv.tv_sec) { puts ("mtim not set to NOW"); status = 1; } if (symlink ("ttt", "tttsym") != 0) error (1, errno, "cannot create symlink"); t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0) error (1, errno, "utimensat failed"); if (lstat64 ("tttsym", &st2) != 0) error (1, errno, "lstat failed"); if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0) { puts ("symlink atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("symlink mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 1; t[0].tv_nsec = 0; t[1].tv_sec = 1; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0) { puts ("atim not reset to one"); status = 1; } if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to one"); status = 1; } if (status == 0) puts ("all OK"); out: close (fd); unlink ("ttt"); unlink ("tttsym"); return status; } [akpm@linux-foundation.org: add missing i386 syscall table entry] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Alexey Dobriyan <adobriyan@openvz.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | dma_declare_coherent_memory wrong allocationGuennadi Liakhovetski2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | dma_declare_coherent_memory() allocates a bitmap 1 bit per page, it calculates the bitmap size based on size of long, but allocates bytes... Thanks to James Bottomley for clarifications and corrections. Signed-off-by: G. Liakhovetski <g.liakhovetski@gmx.de> Acked-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | apm: fix incorrect commentAlan Cox2007-05-08
| | | | | | | | | | | | | | | | HZ has not always been 100Hz for some time. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | EFI: warn only for pre-1.00 system tablesBjorn Helgaas2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | We used to warn unless the EFI system table major revision was exactly 1. But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't affect anything in Linux. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Kprobes: print details of kretprobe on assertion failureAnanth N Mavinakayanahalli2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In certain cases like when the real return address can't be found or when the number of tracked calls to a kretprobed function is less than the number of returns, we may not be able to find the correct return address after processing a kretprobe. Currently we just do a BUG_ON, but no information is provided about the actual failing kretprobe. Print out details of the kretprobe before calling BUG(). Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | header cleaning: don't include smp_lock.h when not usedRandy Dunlap2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | move die notifier handling to common codeChristoph Hellwig2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ipmi: add new IPMI nmi watchdog handlingCorey Minyard2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | Convert over to the new NMI handling for getting IPMI watchdog timeouts via an NMI. This add config options to know if there is the ability to receive NMIs and if it has an NMI post processing call. Then it modifies the IPMI watchdog to take advantage of this so that it can know if an NMI comes in. It also adds testing that the IPMI NMI watchdog works. Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | use SLAB_PANIC flag cleanupAkinobu Mita2007-05-08
|/ | | | | | | | | | | | | | Use SLAB_PANIC and delete duplicated panic(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Ian Molton <spyro@f2s.com> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* get_unmapped_area handles MAP_FIXED on i386Benjamin Herrenschmidt2007-05-07
| | | | | | | | | | | | | Handle MAP_FIXED in i386 hugetlb_get_unmapped_area(), just call prepare_hugepage_range. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Andi Kleen <ak@suse.de> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* SLUB coreChristoph Lameter2007-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Revert "[PATCH] x86: __pa and __pa_symbol address space separation"Linus Torvalds2007-05-07
| | | | | | | | | | | | | | | | | | | | | | | This was broken. It adds complexity, for no good reason. Rather than separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(), and preferably __pa() too - and just use "virt_to_phys()" instead, which is more readable and has nicer semantics. However, right now, just undo the separation, and make __pa_symbol() be the exact same as __pa(). That fixes the bugs this patch introduced, and we can do the fairly obvious cleanups later. Do the new __phys_addr() function (which is now the actual workhorse for the unified __pa()/__pa_symbol()) as a real external function, that way all the potential issues with compile/link-time optimizations of constant symbol addresses go away, and we can also, if we choose to, add more sanity-checking of the argument. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds2007-05-05
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in ↵Thomas Renninger2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | late_initcall In arch/i386/cpu/common.c there is: cpu_devs[X86_VENDOR_INTEL] cpu_devs[X86_VENDOR_CYRIX] cpu_devs[X86_VENDOR_AMD] ... They are all filled with data early. The data (struct) got set to NULL for all, but Intel in different late_initcall (exit_cpu_vendor) calls. I don't see what sense this makes at all, maybe something that got forgotten with the HOTPLUG_CPU extenstions? Please check/review whether initdata, cpuinitdata is still ok and this still works with HOTPLUG_CPU and without, it should... Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: davej@redhat.com
| * [PATCH] i386: type may be unusedDavid Rientjes2007-05-02
| | | | | | | | | | | | | | | | | | In the case of !CONFIG_PCI_DIRECT && !CONFIG_PCI_MMCONFIG, type is unreferened. Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Some additional chipset register values validation.Olivier Galibert2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On i945, a mmconfig range hitting the f0000000-ffffffff zone conflicts with the APIC registers and others. Consider it invalid. On E7520, values 0000 and f000 for the window register are defined invalid in the documentation. I haven't seen a bios use these values, but who trusts biosen these days? Signed-off-by: Olivier Galibert <galibert@pobox.com> Signed-off-by: Andi Kleen <ak@suse.de> arch/i386/pci/mmconfig-shared.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)
| * [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.Bill Irwin2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only 1GB-aligned kernel/user splits are now handled for PAE. The 2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1 mapping for physical memory by using an actual split of 1.875/2.125 to accommodate 128MB of vmallocspace out of what would otherwise be a full 2GB for userspace. That attempt disturbs the alignment required by PAE for 2GB/2GB splits, and furthermore does not provide a 2GB/2GB split as advertised. This patch resolves the issues here in two manners. The first is by providing a true 2GB/2GB split in addition to the 1.875/2.125 split. The second is by renaming the 1.875/2.125 split to CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which performs a similar manuever to avoid aliasing vmallocspace with the 1:1 mapping for physical memory around the 3GB boundary. With the 1.875/2.125 split properly-named, its config option is then tagged as depending on !HIGHMEM to express the PAE implementation's current inability to deal with such unaligned splits. This patch is essentially a combination of two patches, one written by Eric Biederman and the other by Eric Dumazet. If they could add their Signed-off-by: to this, I'd be much obliged. Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Mark Lord <lkml@rtr.ca> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Drop noisy e820 debugging printksAndi Kleen2007-05-02
| | | | | | | | Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)Andi Kleen2007-05-02
| | | | | | | | | | | | access_ok checks this case anyways, no need to check twice. Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Little cleanups in smpboot.cAndi Kleen2007-05-02
| | | | | | | | | | | | | | - Remove #if that is always set - Fix warning Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386Andi Kleen2007-05-02
| | | | | | | | | | | | Syncs up with x86-64. Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Verify important CPUID bits in real modeAndi Kleen2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | Check some CPUID bits that are needed for compiler generated early in boot. When the system is still in real mode before changing the VESA BIOS mode it is possible to still display an visible error message on the screen. Similar to x86-64. Includes cleanups from Eric Biederman Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Drop -traditional in arch/i386/bootAndi Kleen2007-05-02
| | | | | | | | | | | | Needed for followon patch Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Clean up NMI watchdog codeAndi Kleen2007-05-02
| | | | | | | | | | | | | | | | | | | | | | - Introduce a wd_ops structure - Convert the various nmi watchdogs over to it - This allows to split the perfctr reservation from the watchdog setup cleanly. - Do perfctr reservation globally as it should have always been - Remove dead code referenced only by unused EXPORT_SYMBOLs Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: fix wrong comment for syscall stack layoutAndi Kleen2007-05-02
| | | | | | | | | | | | | | | | `ret_from_sys_call' label no longer exist and `syscall_exit' label was introduced instead. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: convert to the kthread APIEric W. Biederman2007-05-02
| | | | | | | | | | | | | | | | | | | | This patch just trivial converts from calling kernel_thread and daemonize to just calling kthread_run. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * [PATCH] i386: remove xtime_lock'ing around cpufreq notifierDaniel Walker2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The locking of the xtime_lock around the cpu notifier is unessesary now. At one time the tsc was used after a frequency change for timekeeping, but the re-write of timekeeping no longer uses the TSC unless the frequency is constant. The variables that are changed in this section of code had also once been used for timekeeping, but not any longer .. Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * [PATCH] i386: check capabilityJoachim Deguara2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the i386 architecture checks the family for mce capability and this removes that and uses the CPUID information. Tested on a K8 revE and a family10h processor. This eliminates checking of a set AMD procesor family if mce is allowed and relies on the information being in CPUID. Signed-off-by: Joachim Deguara <joachim.deguara@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * [PATCH] i386: clean up flush_tlb_others fnKeshavamurthy, Anil S2007-05-02
| | | | | | | | | | | | | | | | | | Cleanup flush_tlb_others(), no functional change. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * [PATCH] i386: replace spin_lock_irqsave with spin_lockHisashi Hifumi2007-05-02
| | | | | | | | | | | | | | | | | | | | IRQ is already disabled through local_irq_disable(). So spin_lock_irqsave() can be replaced with spin_lock(). Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * [PATCH] i386: avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not definedKeshavamurthy, Anil S2007-05-02
| | | | | | | | | | | | | | | | | | | | | | Avoid checking for cpu gone in mm hot path when CONFIG_HOTPLUG_CPU is not defined. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * [PATCH] i386: PARAVIRT: fix startup_ipi_hook config dependencyJeremy Fitzhardinge2007-05-02
| | | | | | | | | | | | | | | | startup_ipi_hook depends on CONFIG_X86_LOCAL_APIC, so move it to the right part of the paravirt_ops initialization. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: fix mtrr sectionsRandy Dunlap2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix section mismatch warnings in mtrr code. Fix line length on one source line. WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.data: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x103) WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x180) WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x199) WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x1c1) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * [PATCH] i386: Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386Fernando Luis [** ISO-8859-1 charset **] VzquezCao2007-05-02
| | | | | | | | | | | | | | | | | | Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is NMI_VECTOR to avoid potential hangups in the event of crash when kdump tries to stop the other CPUs. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: __send_IPI_dest_field - i386Fernando Luis [** ISO-8859-1 charset **] VzquezCao2007-05-02
| | | | | | | | | | | | | | | | | | Implement __send_IPI_dest_field which can be used to send IPIs when the "destination shorthand" field of the ICR is set to 00 (destination field). Use it whenever possible. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: use safe_apic_wait_icr_idle in smpboot.cFernando Luis VazquezCao2007-05-02
| | | | | | | | | | | | | | | | | | __inquire_remote_apic is used for APIC debugging, so use safe_apic_wait_icr_idle instead of apic_wait_icr_idle to avoid possible lockups when APIC delivery fails. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: use safe_apic_wait_icr_idle - i386Fernando Luis VazquezCao2007-05-02
| | | | | | | | | | | | | | | | | | The functionality provided by the new safe_apic_wait_icr_idle is being open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle instead to consolidate code and ease maintenance. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: safe_apic_wait_icr_idle - i386Fernando Luis VazquezCao2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | apic_wait_icr_idle looks like this: static __inline__ void apic_wait_icr_idle(void) { while (apic_read(APIC_ICR) & APIC_ICR_BUSY) cpu_relax(); } The busy loop in this function would not be problematic if the corresponding status bit in the ICR were always updated, but that does not seem to be the case under certain crash scenarios. Kdump uses an IPI to stop the other CPUs in the event of a crash, but when any of the other CPUs are locked-up inside the NMI handler the CPU that sends the IPI will end up looping forever in the ICR check, effectively hard-locking the whole system. Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3: "A local APIC unit indicates successful dispatch of an IPI by resetting the Delivery Status bit in the Interrupt Command Register (ICR). The operating system polls the delivery status bit after sending an INIT or STARTUP IPI until the command has been dispatched. A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions. If the IPI is not successfully dispatched, the operating system can abort the command. Alternatively, the operating system can retry the IPI by writing the lower 32-bit double word of the ICR. This “time-out” mechanism can be implemented through an external interrupt, if interrupts are enabled on the processor, or through execution of an instruction or time-stamp counter spin loop." Intel's documentation suggests the implementation of a time-out mechanism, which, by the way, is already being open-coded in some parts of the kernel that tinker with ICR. Create a apic_wait_icr_idle replacement that implements the time-out mechanism and that can be used to solve the aforementioned problem. AK: moved both functions out of line AK: added improved loop from Keith Owens Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in syncBernhard Kaindl2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If our copy of the MTRRs of the BSP has RdMem or WrMem set, and we are running on an AMD64/K8 system, the boot CPU must have had MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would have copied these bits cleared), so we set them on this CPU as well. This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync across the CPUs of SMP systems in order to fullfill the duty of system software to "initialize and maintain MTRR consistency across all processors." as written in the AMD and Intel manuals. If an WRMSR instruction fails because MtrrFixDramModEn is not set, I expect that also the Intel-style MTRR bits are not updated. AK: minor cleanup, moved MSR defines around Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>
| * [PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspendingBernhard Kaindl2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note: This patch didn'nt need an update since it's initial post. Some BIOSes may modify fixed-range MTRRs in SMM, e.g. when they transition the system into ACPI mode, which is entered thru an SMI, triggered by Linux in acpi_enable(). SMIs which cause that Linux is interrupted and BIOS code is executed (which may change e.g. fixed-range MTRRs) in SMM may be raised by an embedded system controller which is often found in notebooks also at other occasions. If we would not update our copy of the fixed-range MTRRs before suspending to RAM or to disk, restore_processor_state() would set the fixed-range MTRRs of the BSP using old backup values which may be outdated and this could cause the system to fail later during resume. This patch ensures that our copy of the fixed-range MTRRs is updated when saving the boot processor state on suspend to disk and suspend to RAM. In combination with other patches this allows to fix s2ram and s2disk on the Acer Ferrari 1000 notebook and at least s2disk on the Acer Ferrari 5000 notebook. Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>
| * [PATCH] x86: Save the MTRRs of the BSP before booting an APBernhard Kaindl2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Applied fix by Andew Morton: http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'. AMD and Intel x86 CPU manuals state that it is the responsibility of system software to initialize and maintain MTRR consistency across all processors in Multi-Processing Environments. Quote from page 188 of the AMD64 System Programming manual (Volume 2): 7.6.5 MTRRs in Multi-Processing Environments "In multi-processing environments, the MTRRs located in all processors must characterize memory in the same way. Generally, this means that identical values are written to the MTRRs used by the processors." (short omission here) "Failure to do so may result in coherency violations or loss of atomicity. Processor implementations do not check the MTRR settings in other processors to ensure consistency. It is the responsibility of system software to initialize and maintain MTRR consistency across all processors." Current Linux MTRR code already implements the above in the case that the BIOS does not properly initialize MTRRs on the secondary processors, but the case where the fixed-range MTRRs of the boot processor are changed after Linux started to boot, before the initialsation of a secondary processor, is not handled yet. In this case, secondary processors are currently initialized by Linux with MTRRs which the boot processor had very early, when mtrr_bp_init() did run, but not with the MTRRs which the boot processor uses at the time when that secondary processors is actually booted, causing differing MTRR contents on the secondary processors. Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs of the boot processor when it transitions the system into ACPI mode. The SMI handler of the BIOS does this in SMM, entered while Linux ACPI code runs acpi_enable(). Other occasions where the SMI handler of the BIOS may change bits in the MTRRs could occur as well. To initialize newly booted secodary processors with the fixed-range MTRRs which the boot processor uses at that time, this patch saves the fixed-range MTRRs of the boot processor before new secondary processors are started. When the secondary processors run their Linux initialisation code, their fixed-range MTRRs will be updated with the saved fixed-range MTRRs. If CONFIG_MTRR is not set, we define mtrr_save_state as an empty statement because there is nothing to do. Possible TODOs: *) CPU-hotplugging outside of SMP suspend/resume is not yet tested with this patch. *) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up, then the calls to mtrr_save_state() could be replaced by calls to mtrr_save_fixed_ranges(NULL) and mtrr_save_state() would not be needed. That would need either verification of the CPU-hotplug code or at least a test on a >2 CPU machine. *) The MTRRs of other running processors are not yet checked at this time but it might be interesting to syncronize the MTTRs of all processors before booting. That would be an incremental patch, but of rather low priority since there is no machine known so far which would require this. AK: moved prototypes on x86-64 around to fix warnings Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>