aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds2011-08-04
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] signal: use set_restore_sigmask() helper [S390] smp: remove pointless comments in startup_secondary() [S390] qdio: Use kstrtoul_from_user [S390] sclp_async: Use kstrtoul_from_user [S390] exec: remove redundant set_fs(USER_DS) [S390] cpu hotplug: on cpu start wait until being marked active [S390] signal: convert to use set_current_blocked() [S390] asm offsets: fix coding style [S390] Add support for IBM zEnterprise 114 [S390] dasd: check if raw track access is supported [S390] Use diagnose 308 for system reset [S390] Export store_status() function [S390] dasd: use vmalloc for statistics input buffer [S390] Add PSW restart shutdown trigger [S390] missing return in page_table_alloc_pgste [S390] qdio: 2nd stage retry on SIGA-W busy conditions
| * [S390] signal: use set_restore_sigmask() helperHeiko Carstens2011-08-03
| | | | | | | | | | | | | | | | We should call set_restore_sigmask() instead of directly setting TIF_RESTORE_SIGMASK. This change should have been done three years earlier... see 4e4c22 "signals: add set_restore_sigmask". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] smp: remove pointless comments in startup_secondary()Heiko Carstens2011-08-03
| | | | | | | | | | | | | | | | Remove pointless comments in startup_secondary(). There is not too much value in having comments like e.g. "call cpu notifiers" just before a call to notify_cpu*(). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] qdio: Use kstrtoul_from_userPeter Huewe2011-08-03
| | | | | | | | | | | | | | | | | | | | This patch replaces the code for getting an unsigned long from a userspace buffer by a simple call to kstroul_from_user. This makes it easier to read and less error prone. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Acked-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] sclp_async: Use kstrtoul_from_userPeter Huewe2011-08-03
| | | | | | | | | | | | | | | | | | This patch replaces the code for getting an unsigned long from a userspace buffer by a simple call to kstroul_from_user. This makes it easier to read and less error prone. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] exec: remove redundant set_fs(USER_DS)Mathias Krause2011-08-03
| | | | | | | | | | | | | | | | The address limit is already set in flush_old_exec() so those calls to set_fs(USER_DS) are redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] cpu hotplug: on cpu start wait until being marked activeHeiko Carstens2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | This is the same as fd8a7de1 "x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU". Unlike on x86 this doesn't fix a bug on s390 since we do not have threaded interrupt handlers. However we want to keep the same initialization order like on x86. This should prevent bugs caused by code which assumes (and relies on) the init order is the same on each architecture. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] signal: convert to use set_current_blocked()Heiko Carstens2011-08-03
| | | | | | | | | | | | Convert to use set_current_blocked() like x86. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] asm offsets: fix coding styleHeiko Carstens2011-08-03
| | | | | | | | | | | | | | Because of readability reasons we ignore the 80 character line limit in asm offsets. Just one line per define, nothing else. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] Add support for IBM zEnterprise 114Heiko Carstens2011-08-03
| | | | | | | | | | | | Just fix up the Kconfig description and the elf platform. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] dasd: check if raw track access is supportedStefan Haberland2011-08-03
| | | | | | | | | | | | | | | | | | | | | | To use raw track access some special storage server commands are needed. Older storage hardware may not support these commands. So check if raw track access is possible while setting the DASD online. Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] Use diagnose 308 for system resetMichael Holzheu2011-08-03
| | | | | | | | | | | | | | | | The diagnose 308 call is the prefered method for clearing all ongoing I/O. Therefore if it is available we use it instead of doing a manual reset. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] Export store_status() functionMichael Holzheu2011-08-03
| | | | | | | | | | | | | | | | | | For kdump we need a store status function to save the registers for the current CPU. Therefore this patch exports a function "store_status()". In addition to that now also floating point registers are saved correctly. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] dasd: use vmalloc for statistics input bufferStefan Weinhuber2011-08-03
| | | | | | | | | | | | | | | | | | | | The size of the buffer that is used to store DASD statistics input strings depends on the user input. If the input string is to large, the write operation could fail with -ENOMEM. To avoid this, use vmalloc instead of kmalloc. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] Add PSW restart shutdown triggerMichael Holzheu2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | With this patch a new S390 shutdown trigger "restart" is added. If under z/VM "systerm restart" is entered or under the HMC the "PSW restart" button is pressed, the PSW located at 0 (31 bit) or 0x1a0 (64 bit) bit is loaded. Now we execute do_restart() that processes the restart action that is defined under /sys/firmware/shutdown_actions/on_restart. Currently the following actions are possible: reipl (default), stop, vmcmd, dump, and dump_reipl. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] missing return in page_table_alloc_pgsteJan Glauber2011-08-03
| | | | | | | | | | | | | | | | | | | | | | Fix the following compile warning for !CONFIG_PGSTE: CC arch/s390/mm/pgtable.o arch/s390/mm/pgtable.c: In function ‘page_table_alloc_pgste’: arch/s390/mm/pgtable.c:531:1: warning: no return statement in function returning non-void [-Wreturn-type] Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * [S390] qdio: 2nd stage retry on SIGA-W busy conditionsJan Glauber2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SIGA-W may return with the busy bit set which means the device was blocked. The busy loop which retries the SIGA-W for 100us may not be long enough when running under a heavily loaded hypervisor. Extend the retry mechanism by adding a longer second stage which retries the SIGA-W for up to 10s. In difference to the first retry loop the second stage is using mdelay to stop the cpu between the retries and thereby avoid additional preassure in on the hypervisor. If the second stage retry is successfull a device reset is avoided. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* | eisa/pci_eisa.c: fix BUG introduced by 005bdad7b80Arnaud Lacombe2011-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While `pci_eisa_driver' still refer `pci_eisa_init', the .probe() function should not be called after init memory release, as pointed out by commit 74b9a297. The structure is still referenced in the drivers subsystem, and can be accesseed through sysfs, so the modpost warning is a false positive. Mark it as such. In the same time, the warning referenced in 005bdad7b80 did only mention `pci_eisa_driver', not `pci_eisa_pci_tbl', so remove its marking. Broken-by: Arnaud Lacombe <lacombar@gmail.com> (in 005bdad7b80) Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Arnaud Lacombe <lacombar@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Boot up with usermodehelper disabledLinus Torvalds2011-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The core device layer sends tons of uevent notifications for each device it finds, and if the kernel has been built with a non-empty CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode helper binary for all these events very early in the boot. Not only won't the root filesystem even be mounted at that point, we literally won't have necessarily even initialized all the process handling data structures at that point, which causes no end of silly problems even when the usermode helper doesn't actually succeed in executing. So just use our existing infrastructure to disable the usermodehelpers to make the kernel start out with them disabled. We enable them when we've at least initialized stuff a bit. Problems related to an uninitialized init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex reported by various people. Reported-by: Manuel Lauss <manuel.lauss@googlemail.com> Reported-by: Richard Weinberger <richard@nod.at> Reported-by: Marc Zyngier <maz@misterjones.org> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | x86: don't include xen/xen.h in <asm/io.h> unless XEN is enabledLinus Torvalds2011-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry Kasatkin reports: "kernel-devel package with kernel headers have no <include/xen> directory if XEN is disabled. Modules which inclide asm/io.h won't compile. XEN related content is behind the CONFIG_XEN flag in the io.h. And <xen/xen.h> should be also behind CONFIG_XEN flag." So move the include of <xen/xen.h> down into the section that is conditional on CONFIG_XEN. Reported-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-04
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: ad7879 - fix deficient device disable Input: gpio_keys - fix two typos in devicetree documentation Input: mma8450 - add device tree probe support Input: gpio_keys - return proper error code if memory allocation fails Input: lm8323 - add missing device_remove_file for dev_attr_time Input: tegra-kbc - fix computation of polling time Input: kxtj9 - explicitly include module.h Input: psmouse - hgpk.c needs module.h
| * | Input: ad7879 - fix deficient device disableMichael Hennerich2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Input close or device disable should not interact with the exported gpiolib functionality. However that's the case. __ad7879_disable() clears the entire AD7879_REG_CTRL2, while it should just power down the ADC and its reference. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Input: gpio_keys - fix two typos in devicetree documentationTobias Klauser2011-08-03
| | | | | | | | | | | | | | | Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Input: mma8450 - add device tree probe supportShawn Guo2011-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | It adds device tree probe support for mma8450 driver. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Eric Miao <eric.miao@linaro.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Input: gpio_keys - return proper error code if memory allocation failsTobias Klauser2011-07-30
| | | | | | | | | | | | | | | | | | | | | Return -ENOMEM if kzalloc fails in gpio_keys_get_devtree_pdata(). Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Input: lm8323 - add missing device_remove_file for dev_attr_timeAxel Lin2011-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing device_remove_file() for dev_attr_time in lm8323_remove(). Also calling device_remove_file() in lm8323_probe() error path to remove sysfs attribute file. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Input: tegra-kbc - fix computation of polling timeRakesh Iyer2011-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a constant definition and computation of polling time. [dtor@mail.ru: switched to using DIV_ROUND_UP as was suggested by Thierry Reding <thierry.reding@avionic-design.de>] Signed-off-by: Rakesh Iyer <riyer@nvidia.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Input: kxtj9 - explicitly include module.hStephen Rothwell2011-07-30
| | | | | | | | | | | | | | | | | | | | | | | | We need to explicitly include module.h since some of its facilities are used. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Input: psmouse - hgpk.c needs module.hRandy Dunlap2011-07-30
| | | | | | | | | | | | | | | | | | | | | | | | hgpk.c uses interfaces from linux/module.h, so it should include that file. This fixes build errors. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
* | | Merge branch 'idle-release' of ↵Linus Torvalds2011-08-04
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: cpuidle: stop depending on pm_idle x86 idle: move mwait_idle_with_hints() to where it is used cpuidle: replace xen access to x86 pm_idle and default_idle cpuidle: create bootparam "cpuidle.off=1" mrst_pmu: driver for Intel Moorestown Power Management Unit
| * | | cpuidle: stop depending on pm_idleLen Brown2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpuidle users should call cpuidle_call_idle() directly rather than via (pm_idle)() function pointer. Architecture may choose to continue using (pm_idle)(), but cpuidle need not depend on it: my_arch_cpu_idle() ... if(cpuidle_call_idle()) pm_idle(); cc: Kevin Hilman <khilman@deeprootsystems.com> cc: Paul Mundt <lethal@linux-sh.org> cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | x86 idle: move mwait_idle_with_hints() to where it is usedLen Brown2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...and make it static no functional change cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | cpuidle: replace xen access to x86 pm_idle and default_idleLen Brown2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a Xen Dom0 kernel boots on a hypervisor, it gets access to the raw-hardware ACPI tables. While it parses the idle tables for the hypervisor's beneift, it uses HLT for its own idle. Rather than have xen scribble on pm_idle and access default_idle, have it simply disable_cpuidle() so acpi_idle will not load and architecture default HLT will be used. cc: xen-devel@lists.xensource.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | cpuidle: create bootparam "cpuidle.off=1"Len Brown2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | useful for disabling cpuidle to fall back to architecture-default idle loop cpuidle drivers and governors will fail to register. on x86 they'll say so: intel_idle: intel_idle yielding to (null) ACPI: acpi_idle yielding to (null) Signed-off-by: Len Brown <len.brown@intel.com>
| * | | mrst_pmu: driver for Intel Moorestown Power Management UnitLen Brown2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Moorestown (MRST) Power Management Unit (PMU) driver directs the SOC power states in the "Langwell" south complex (SCU). It hooks pci_platform_pm_ops[] and thus observes all PCI ".set_state" requests. For devices in the SC, the pmu driver translates those PCI requests into the appropriate commands for the SCU. The PMU driver helps implement S0i3, a deep system idle power idle state. Entry into S0i3 is via cpuidle, just like regular processor c-states. S0i3 depends on pre-conditions including uni-processor, graphics off, and certain IO devices in the SC must be off. If those pre-conditions are met, then the PMU allows cpuidle to enter S0i3, otherwise such requests are demoted, either to Atom C4 or Atom C6. This driver is based on prototype work by Bruce Flemming, Illyas Mansoor, Rajeev D. Muralidhar, Vishwesh M. Rudramuni, Hari Seshadri and Sujith Thomas. The current driver also includes contributions from H. Peter Anvin, Arjan van de Ven, Kristen Accardi, and Yong Wang. Thanks for additional review feedback from Alan Cox and Randy Dunlap. Acked-by: Alan Cox <alan@linux.intel.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* | | | Merge branch 'apei-release' of ↵Linus Torvalds2011-08-04
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI, APEI, EINJ Param support is disabled by default APEI GHES: 32-bit buildfix ACPI: APEI build fix ACPI, APEI, GHES: Add hardware memory error recovery support HWPoison: add memory_failure_queue() ACPI, APEI, GHES, Error records content based throttle ACPI, APEI, GHES, printk support for recoverable error via NMI lib, Make gen_pool memory allocator lockless lib, Add lock-less NULL terminated single list Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG ACPI, APEI, Add WHEA _OSC support ACPI, APEI, Add APEI bit support in generic _OSC call ACPI, APEI, GHES, Support disable GHES at boot time ACPI, APEI, GHES, Prevent GHES to be built as module ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST ACPI, APEI, Add apei_exec_run_optional ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic ACPI, APEI, ERST, Fix erst-dbg long record reading issue ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled
| * \ \ \ Merge branch 'apei' into apei-releaseLen Brown2011-08-03
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some trivial conflicts due to other various merges adding to the end of common lists sooner than this one. arch/ia64/Kconfig arch/powerpc/Kconfig arch/x86/Kconfig lib/Kconfig lib/Makefile Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI, APEI, EINJ Param support is disabled by defaultHuang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EINJ parameter support is only usable for some specific BIOS. Originally, it is expected to have no harm for BIOS does not support it. But now, we found it will cause issue (memory overwriting) for some BIOS. So param support is disabled by default and only enabled when newly added module parameter named "param_extension" is explicitly specified. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | APEI GHES: 32-bit buildfixLen Brown2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression ghes.c:(.text+0x46289): undefined reference to `__udivdi3'   in function ghes_estatus_cache_add(). Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI: APEI build fixLen Brown2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as GHES is optional... When # CONFIG_ACPI_APEI_GHES is not set: (.init.text+0x4c22): undefined reference to `ghes_disable' Reported-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI, APEI, GHES: Add hardware memory error recovery supportHuang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memory_failure_queue() is called when recoverable memory errors are notified by firmware to do the recovery work. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | HWPoison: add memory_failure_queue()Huang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memory_failure() is the entry point for HWPoison memory error recovery. It must be called in process context. But commonly hardware memory errors are notified via MCE or NMI, so some delayed execution mechanism must be used. In MCE handler, a work queue + ring buffer mechanism is used. In addition to MCE, now APEI (ACPI Platform Error Interface) GHES (Generic Hardware Error Source) can be used to report memory errors too. To add support to APEI GHES memory recovery, a mechanism similar to that of MCE is implemented. memory_failure_queue() is the new entry point that can be called in IRQ context. The next step is to make MCE handler uses this interface too. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI, APEI, GHES, Error records content based throttleHuang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | printk is used by GHES to report hardware errors. Ratelimit is enforced on the printk to avoid too many hardware error reports in kernel log. Because there may be thousands or even millions of corrected hardware errors during system running. Currently, a simple scheme is used. That is, the total number of hardware error reporting is ratelimited. This may cause some issues in practice. For example, there are two kinds of hardware errors occurred in system. One is corrected memory error, because the fault memory address is accessed frequently, there may be hundreds error report per-second. The other is corrected PCIe AER error, it will be reported once per-second. Because they share one ratelimit control structure, it is highly possible that only memory error is reported. To avoid the above issue, an error record content based throttle algorithm is implemented in the patch. Where after the first successful reporting, all error records that are same are throttled for some time, to let other kinds of error records have the opportunity to be reported. In above example, the memory errors will be throttled for some time, after being printked. Then the PCIe AER error will be printked successfully. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI, APEI, GHES, printk support for recoverable error via NMIHuang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some APEI GHES recoverable errors are reported via NMI, but printk is not safe in NMI context. To solve the issue, a lock-less memory allocator is used to allocate memory in NMI handler, save the error record into the allocated memory, put the error record into a lock-less list. On the other hand, an irq_work is used to delay the operation from NMI context to IRQ context. The irq_work IRQ handler will remove nodes from lock-less list, printk the error record and do some further processing include recovery operation, then free the memory. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | lib, Make gen_pool memory allocator locklessHuang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This version of the gen_pool memory allocator supports lockless operation. This makes it safe to use in NMI handlers and other special unblockable contexts that could otherwise deadlock on locks. This is implemented by using atomic operations and retries on any conflicts. The disadvantage is that there may be livelocks in extreme cases. For better scalability, one gen_pool allocator can be used for each CPU. The lockless operation only works if there is enough memory available. If new memory is added to the pool a lock has to be still taken. So any user relying on locklessness has to ensure that sufficient memory is preallocated. The basic atomic operation of this allocator is cmpxchg on long. On architectures that don't have NMI-safe cmpxchg implementation, the allocator can NOT be used in NMI handler. So code uses the allocator in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | lib, Add lock-less NULL terminated single listHuang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cmpxchg is used to implement adding new entry to the list, deleting all entries from the list, deleting first entry of the list and some other operations. Because this is a single list, so the tail can not be accessed in O(1). If there are multiple producers and multiple consumers, llist_add can be used in producers and llist_del_all can be used in consumers. They can work simultaneously without lock. But llist_del_first can not be used here. Because llist_del_first depends on list->first->next does not changed if list->first is not changed during its operation, but llist_del_first, llist_add, llist_add (or llist_del_all, llist_add, llist_add) sequence in another consumer may violate that. If there are multiple producers and one consumer, llist_add can be used in producers and llist_del_all or llist_del_first can be used in the consumer. This can be summarized as follow: | add | del_first | del_all add | - | - | - del_first | | L | L del_all | | | - Where "-" stands for no lock is needed, while "L" stands for lock is needed. The list entries deleted via llist_del_all can be traversed with traversing function such as llist_for_each etc. But the list entries can not be traversed safely before deleted from the list. The order of deleted entries is from the newest to the oldest added one. If you want to traverse from the oldest to the newest, you must reverse the order by yourself before traversing. The basic atomic operation of this list is cmpxchg on long. On architectures that don't have NMI-safe cmpxchg implementation, the list can NOT be used in NMI handler. So code uses the list in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHGHuang Ying2011-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmpxchg() is widely used by lockless code, including NMI-safe lockless code. But on some architectures, the cmpxchg() implementation is not NMI-safe, on these architectures the lockless code may need a spin_trylock_irqsave() based implementation. This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that NMI-safe lockless code can depend on it or provide different implementation according to it. On many architectures, cmpxchg is only NMI-safe for several specific operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch only guarantees cmpxchg is NMI-safe for sizeof(unsigned long). Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Richard Henderson <rth@twiddle.net> CC: Mikael Starvik <starvik@axis.com> Acked-by: David Howells <dhowells@redhat.com> CC: Yoshinori Sato <ysato@users.sourceforge.jp> CC: Tony Luck <tony.luck@intel.com> CC: Hirokazu Takata <takata@linux-m32r.org> CC: Geert Uytterhoeven <geert@linux-m68k.org> CC: Michal Simek <monstr@monstr.eu> Acked-by: Ralf Baechle <ralf@linux-mips.org> CC: Kyle McMartin <kyle@mcmartin.ca> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Chen Liqin <liqin.chen@sunplusct.com> CC: "David S. Miller" <davem@davemloft.net> CC: Ingo Molnar <mingo@redhat.com> CC: Chris Zankel <chris@zankel.net> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI, APEI, Add WHEA _OSC supportHuang Ying2011-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | APEI firmware first mode must be turned on explicitly on some machines, otherwise there may be no GHES hardware error record for hardware error notification. APEI bit in generic _OSC call can be used to do that, but on some machine, a special WHEA _OSC call must be used. This patch adds the support to that WHEA _OSC call. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI, APEI, Add APEI bit support in generic _OSC callHuang Ying2011-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In APEI firmware first mode, hardware error is reported by hardware to firmware firstly, then firmware reports the error to Linux in a GHES error record via POLL/SCI/IRQ/NMI etc. This may result in some issues if OS has no full APEI support. So some firmware implementation will work in a back-compatible mode by default. Where firmware will only notify OS in old-fashion, without GHES record. For example, for a fatal hardware error, only NMI is signaled, no GHES record. To gain full APEI power on these machines, APEI bit in generic _OSC call can be specified to tell firmware that Linux has full APEI support. This patch adds the APEI bit support in generic _OSC call. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
| | * | | | ACPI, APEI, GHES, Support disable GHES at boot timeHuang Ying2011-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some machine may have broken firmware so that GHES and firmware first mode should be disabled. This patch adds support to that. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>