litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	Make platform-specific Feather-Trace depend on !CONFIG_DEBUG_RODATA	Bjoern B. Brandenburg	2010-06-09
\| \| \| \| \| \| \| \| \| \| \|	Feather-Trace rewrites instructions in the kernel's .text segment. This segment may be write-protected if CONFIG_DEBUG_RODATA is selected. In this case, fall back to the default flag-based Feather-Trace implementation. In the future, we could either adopt the ftrace method of rewriting .text addresses using non-.text mappings or we could consider replacing Feather-Trace with ftrace altogether. For now, this patch avoids unexpected runtime errors.
*	Make C-EDF depend on x86 and SYSFS	Andrea Bastoni	2010-06-09
\| \| \| \| \| \|	C-EDF depends on intel_cacheinfo.c (for get_shared_cpu_map()) which is only available on x86 architectures. Furthermore, get_shared_cpu_map() is only available if SYSFS filesystem is present.
*	Make smp_send_pull_timers() optional.	Bjoern B. Brandenburg	2010-06-09
\| \| \| \| \|	There is currently no need to implement this in ARM. So let's make it optional instead.
*	Make __ARCH_HAS_FEATHER_TRACE a proper CONFIG_ variable.	Bjoern B. Brandenburg	2010-06-09
\| \| \| \| \| \| \| \| \|	The idea of the Feather-Trace default implementation is that LITMUS^RT should work without a specialized Feather-Trace implementation present. This was actually broken. Changes litmus/feather_trace.h to only include asm/feather_trace.h if actually promised by the architecture.
*	Merge branch 'linux-preempt-rt' into wip-merge-preempt-2.6.33-rt	Andrea Bastoni	2010-05-31
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simple merge between 2010.1 (vanilla 2.6.32) and 2.6.33.5-rt22 with conflicts resolved. This commit does not compile, the following main problems are still unresolved: - spinlock -> raw_spinlock (semantics: spinlocks can sleep in -rt) - rwlock and wait_queue_t lock - kfifo API changes - sched_class API changes (get_rr_interval() signature change) Conflicts: Makefile arch/x86/include/asm/unistd_32.h arch/x86/kernel/syscall_table_32.S include/linux/hrtimer.h kernel/hrtimer.c kernel/sched.c kernel/sched_fair.c
\| *	Merge stable/linux-2.6.33.y into rt/2.6.33	Thomas Gleixner	2010-05-31
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| \| *	x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs	Andreas Herrmann	2010-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit f01487119dda3d9f58c9729c7361ecc50a61c188 upstream. If host CPU is exposed to a guest the OSVW MSRs are not guaranteed to be present and a GP fault occurs. Thus checking the feature flag is essential. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100427101348.GC4489@alberich.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86, cacheinfo: Turn off L3 cache index disable feature in virtualized ↵	Frank Arnold	2010-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	environments commit 7f284d3cc96e02468a42e045f77af11e5ff8b095 upstream. When running a quest kernel on xen we get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df PGD 0 Oops: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 0, comm: swapper Tainted: G W 2.6.34-rc3 #1 /HVM domU RIP: 0010:[<ffffffff8142f2fb>] [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x 2ca/0x3df RSP: 0018:ffff880002203e08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060 RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38 R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0 R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68 FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020) Stack: ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30 <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70 <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b Call Trace: <IRQ> [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c [<ffffffff81059140>] ? mod_timer+0x23/0x25 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20 <EOI> [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd [<ffffffff810114eb>] ? default_idle+0x36/0x53 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4 [<ffffffff81423a9a>] rest_init+0x7e/0x80 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107 Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb RIP [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df RSP <ffff880002203e08> CR2: 0000000000000038 ---[ end trace a7919e7f17c0a726 ]--- The L3 cache index disable feature of AMD CPUs has to be disabled if the kernel is running as guest on top of a hypervisor because northbridge devices are not available to the guest. Currently, this fixes a boot crash on top of Xen. In the future this will become an issue on KVM as well. Check if northbridge devices are present and do not enable the feature if there are none. [ hpa: backported to 2.6.34 ] Signed-off-by: Frank Arnold <frank.arnold@amd.com> LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86, k8: Fix build error when K8_NB is disabled	Borislav Petkov	2010-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit ade029e2aaacc8965a548b0b0f80c5bee97ffc68 upstream. K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail at the final linking stage due to missing exported num_k8_northbridges. Add a header stub for that. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100503183036.GJ26107@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	powerpc/perf_event: Fix oops due to perf_event_do_pending call	Paul Mackerras	2010-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 0fe1ac48bef018bed896307cd12f6ca9b5e704ab upstream. Anton Blanchard found that large POWER systems would occasionally crash in the exception exit path when profiling with perf_events. The symptom was that an interrupt would occur late in the exit path when the MSR[RI] (recoverable interrupt) bit was clear. Interrupts should be hard-disabled at this point but they were enabled. Because the interrupt was not recoverable the system panicked. The reason is that the exception exit path was calling perf_event_do_pending after hard-disabling interrupts, and perf_event_do_pending will re-enable interrupts. The simplest and cleanest fix for this is to use the same mechanism that 32-bit powerpc does, namely to cause a self-IPI by setting the decrementer to 1. This means we can remove the tests in the exception exit path and raw_local_irq_restore. This also makes sure that the call to perf_event_do_pending from timer_interrupt() happens within irq_enter/irq_exit. (Note that calling perf_event_do_pending from timer_interrupt does not mean that there is a possible 1/HZ latency; setting the decrementer to 1 ensures that the timer interrupt will happen immediately, i.e. within one timebase tick, which is a few nanoseconds or 10s of nanoseconds.) Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	ptrace: fix return value of do_syscall_trace_enter()	Gerald Schaefer	2010-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 545c174d1f093a462b4bb9131b23d5ea72a600e1 upstream. strace may change the system call number, so regs->gprs[2] must not be read before tracehook_report_syscall_entry(). This fixes a bug where "strace -f" will hang after a vfork(). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| * \|	Merge branch '2.6.33.4' into rt/2.6.33	Thomas Gleixner	2010-05-12
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| \| *	MIPS: Sibyte: Apply M3 workaround only on affected chip types and versions.	Ralf Baechle	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(cherry picked from commit e65c7f33d75e977350ca350573d93c517ec02776) Previously it was unconditionally used on all Sibyte family SOCs. The M3 bug has to be handled in the TLB exception handler which is extremly performance sensitive, so this modification is expected to deliver around 2-3% performance improvment. This is important as required changes to the M3 workaround will make it more costly. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	pxa/colibri: fix missing #include <mach/mfp.h> in colibri.h	Jakob Viketoft	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit ccb8d8d070b8f25f0163da5c9ceacf63a5169540 upstream. The use of mfp_cfg_t causes build errors without including <mach/mfp.h>. CC: Daniel Mack <daniel@caiaq.de> Signed-off-by: Jakob Viketoft <jakob.viketoft@bitsim.com> Signed-off-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	sparc64: Fix memory leak in pci_register_iommu_region().	David S. Miller	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit e182c77cc291456eed127b1472952ddb59a81a9d ] Found by kmemleak. If request_resource() fails, we leak the struct resource we allocated to represent the IOMMU mapping area. This actually happens on sun4v machines because the IOMEM area is only reported sans the IOMMU region, unlike all previous systems. I'll need to fix that at some point, but for now fix the leak. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	sparc64: Adjust __raw_local_irq_save() to cooperate in NMIs.	David S. Miller	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commits 0c25e9e6cbe7b233bb91d14d0e2c258bf8e6ec83 and c011f80ba0912486fe51dd2b3f71d9b33a151188 ] If we are in an NMI then doing a plain raw_local_irq_disable() will write PIL_NORMAL_MAX into %pil, which is lower than PIL_NMI, and thus we'll re-enable NMIs and recurse. Doing a simple: %pil = %pil \| PIL_NORMAL_MAX does what we want, if we're already at PIL_NMI (15) we leave it at that setting, else we set it to PIL_NORMAL_MAX (14). This should get the function tracer working on sparc64. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	sparc64: Use kstack_valid() in die_if_kernel().	David S. Miller	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit cb256aa60409efd803806cfb0528a4b3f8397dba ] This gets rid of a local function (is_kernel_stack()) which tries to do the same thing, yet poorly in that it doesn't handle IRQ stacks properly. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	sparc64: Fix hardirq tracing in trap return path.	David S. Miller	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 28a1f533ae8606020238b840b82ae70a3f87609e ] We can overflow the hardirq stack if we set the %pil here so early, just let the normal control flow do it. This is fine as we are allowed to do the actual IRQ enable at any point after we call trace_hardirqs_on. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	sparc64: Fix PREEMPT_ACTIVE value.	David S. Miller	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 6c94b1ee0ca2bfb526d779c088ec20da6a3761db ] It currently overlaps the NMI bit. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	sparc64: Use correct pt_regs in decode_access_size() error paths.	David S. Miller	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit baa06775e224e9f74e5c2de894c95cd49678beff ] Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	powerpc: Reduce printk from pseries_mach_cpu_die()	Vaidyanathan Srinivasan	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit a8e6da093ea8642b1320fb5d64134366f2a8d0ac upstream. Remove debug printks in pseries_mach_cpu_die(). These are noisy at runtime. Traceevents can be added to instrument this section of code. The following KERN_INFO printks are removed: cpu 62 (hwid 62) returned from cede. Decrementer value = b2802fff Timebase value = 2fa8f95035f4a cpu 62 (hwid 62) got prodded to go online cpu 58 (hwid 58) ceding for offline with hint 2 Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	powerpc: Move checks in pseries_mach_cpu_die()	Vaidyanathan Srinivasan	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 0212f2602a38e740d5a96aba4cebfc2ebc993ecf upstream. Rearrange condition checks for better code readability and prevention of possible race conditions when preferred_offline_state can potentially change during the execution of pseries_mach_cpu_die(). The patch will make pseries_mach_cpu_die() put cpu in one of the consistent states and not hit the run over BUG() Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	powerpc: Reset kernel stack on cpu online from cede state	Vaidyanathan Srinivasan	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 8dbce53cc249a76e9450708d291fce5a7e29c6a1 upstream. Cpu hotplug (offline) without dlpar operation will place cpu in cede state and the extended_cede_processor() function will return when resumed. Kernel stack pointer needs to be reset before start_secondary() is called to continue the online operation. Added new function start_secondary_resume() to do the above steps. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	kgdb: don't needlessly skip PAGE_USER test for Fsl booke	Wufei	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 56151e753468e34aeb322af4b0309ab727c97d2e upstream. The bypassing of this test is a leftover from 2.4 vintage kernels, and is no longer appropriate, or even used by KGDB. Currently KGDB uses probe_kernel_write() for all access to memory via the KGDB core, so it can simply be deleted. This fixes CVE-2010-1446. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Wufei <fei.wu@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD ↵	Borislav Petkov	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	systems commit 0e152cd7c16832bd5cadee0c2e41d9959bc9b6f9 upstream. de957628ce7c84764ff41331111036b3ae5bad0f changed setting of the x86_init.iommu.iommu_init function ptr only when GART IOMMU is found. One side effect of it is that num_k8_northbridges is not initialized anymore if not explicitly called. This resulted in uninitialized pointers in <arch/x86/kernel/cpu/intel_cacheinfo.c:amd_calc_l3_indices()>, for example, which uses the num_k8_northbridges thing through node_to_k8_nb_misc(). Fix that through an initcall that runs right after the PCI subsystem and does all the scanning. Then, remove initialization in gart_iommu_init() which is a rootfs_initcall and we're running before that. What is more, since num_k8_northbridges is being used in other places beside GART IOMMU, include it whenever we add AMD CPU support. The previous dependency chain in kconfig contained K8_NB depends on AGP_AMD64\|GART_IOMMU which was clearly incorrect. The more natural way in terms of hardware dependency should be AGP_AMD64\|GART_IOMMU depends on K8_NB depends on CPU_SUP_AMD && PCI. Make it so Number One! Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20100312144303.GA29262@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests	Prarit Bhargava	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit bbd391a15d82e14efe9d69ba64cadb855b061dba upstream. Upstream PV guests fail to boot because of a NULL pointer in irq_force_complete_move(). It is possible that xen guests have irq_desc->chip_data = NULL. Test for NULL chip_data pointer before attempting to complete an irq move. Signed-off-by: Prarit Bhargava <prarit@redhat.com> LKML-Reference: <20100427152434.16193.49104.sendpatchset@prarit.bos.redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86: Disable large pages on CPUs with Atom erratum AAE44	H. Peter Anvin	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 7a0fc404ae663776e96db43879a0fa24fec1fa3a upstream. Atom erratum AAE44/AAF40/AAG38/AAH41: "If software clears the PS (page size) bit in a present PDE (page directory entry), that will cause linear addresses mapped through this PDE to use 4-KByte pages instead of using a large page after old TLB entries are invalidated. Due to this erratum, if a code fetch uses this PDE before the TLB entry for the large page is invalidated then it may fetch from a different physical address than specified by either the old large page translation or the new 4-KByte page translation. This erratum may also cause speculative code fetches from incorrect addresses." [http://download.intel.com/design/processor/specupdt/319536.pdf] Where as commit 211b3d03c7400f48a781977a50104c9d12f4e229 seems to workaround errata AAH41 (mixed 4K TLBs) it reduces the window of opportunity for the bug to occur and does not totally remove it. This patch disables mixed 4K/4MB page tables totally avoiding the page splitting and not tripping this processor issue. This is based on an original patch by Colin King. Originally-by: Colin Ian King <colin.king@canonical.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero	H. Peter Anvin	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 7ce5a2b9bb2e92902230e3121d8c3047fab9cb47 upstream. When we do a thread switch, we clear the outgoing FS/GS base if the corresponding selector is nonzero. This is taken by __switch_to() as an entry invariant; it does not verify that it is true on entry. However, copy_thread() doesn't enforce this constraint, which can result in inconsistent results after fork(). Make copy_thread() match the behavior of __switch_to(). Reported-and-tested-by: Samuel Thibault <samuel.thibault@inria.fr> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <4BD1E061.8030605@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	powernow-k8: Fix frequency reporting	Mark Langsdorf	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit b810e94c9d8e3fff6741b66cd5a6f099a7887871 upstream. With F10, model 10, all valid frequencies are in the ACPI _PST table. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| * \|	AT91: PIT: Remove irq handler when clock event is unused	Benedikt Spranger	2010-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setup and remove the interrupt handler in clock event mode selection. This avoids calling the (shared) interrupt handler when the device is not used. Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-4	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect inode->i_count with i_lock, rather than having it atomic. Next step should also be to move things together (eg. the refcount increment into d_instantiate, which will remove a lock/unlock cycle on i_lock). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	dcache-split-inode_lock	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache_lock-remove	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dcache_lock no longer protects anything (I hope). remove it. This breaks a lot of the tree where I haven't thought about the problem, but it simplifies the dcache.c code quite a bit (and it's also probably a good thing to break unconverted code). So I include this here before making further changes to the locking. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache-scale-d_unhashed	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect d_unhashed(dentry) condition with d_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache-scale-d_count	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. XXX: This patch does not boot on its own Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	powerpc: Replace kmap_atomic with kmap in pte_offset_map	Kevin Hao	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pte_offset_map/pte_offset_map_nested use kmap_atomic to get the virtual address for the pte table, but kmap_atomic will disable preempt. Hence there will be call trace if we acquire a spin lock after invoking pte_offset_map/pte_offset_map_nested in preempt-rt. To fix it, I've replaced kmap_atomic with kmap in these macros. Signed-off-by: Kevin Hao <kexin.hao@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> LKML-Reference: <ffaf532c138188b526a8c623ed3c7f5067da6d68.1267566249.git.paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	Merge branch 'master' of	Thomas Gleixner	2010-04-27
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.33.y into rt/2.6.33 Conflicts: Makefile arch/x86/include/asm/rwsem.h Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| \| *	x86/gart: Disable GART explicitly before initialization	Joerg Roedel	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 4b83873d3da0704987cb116833818ed96214ee29 upstream. If we boot into a crash-kernel the gart might still be enabled and its caches might be dirty. This can result in undefined behavior later. Fix it by explicitly disabling the gart hardware before initialization and flushing the caches after enablement. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: x86: Fix TSS size check for 16-bit tasks	Jan Kiszka	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit e8861cfe2c75bdce36655b64d7ce02c2b31b604d) A 16-bit TSS is only 44 bytes long. So make sure to test for the correct size on task switch. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: fix the handling of dirty bitmaps to avoid overflows	Takuya Yoshikawa	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit 87bf6e7de1134f48681fd2ce4b7c1ec45458cb6d) Int is not long enough to store the size of a dirty bitmap. This patch fixes this problem with the introduction of a wrapper function to calculate the sizes of dirty bitmaps. Note: in mark_page_dirty(), we have to consider the fact that __set_bit() takes the offset as int, not long. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: MMU: fix kvm_mmu_zap_page() and its calling path	Xiao Guangrong	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit 77662e0028c7c63e34257fda03ff9625c59d939d) This patch fix: - calculate zapped page number properly in mmu_zap_unsync_children() - calculate freeed page number properly kvm_mmu_change_mmu_pages() - if zapped children page it shoud restart hlist walking KVM-Stable-Tag. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: VMX: Save/restore rflags.vm correctly in real mode	Avi Kivity	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit 78ac8b47c566dd6177a3b9b291b756ccb70670b7) Currently we set eflags.vm unconditionally when entering real mode emulation through virtual-8086 mode, and clear it unconditionally when we enter protected mode. The means that the following sequence KVM_SET_REGS (rflags.vm=1) KVM_SET_SREGS (cr0.pe=1) Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode(). Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode: reads and writes to those bits access a shadow register instead of the actual register. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL	Andre Przywara	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit 114be429c8cd44e57f312af2bbd6734e5a185b0d) There is a quirk for AMD K8 CPUs in many Linux kernels (see arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that clears bit 10 in that MCE related MSR. KVM can only cope with all zeros or all ones, so it will inject a #GP into the guest, which will let it panic. So lets add a quirk to the quirk and ignore this single cleared bit. This fixes -cpu kvm64 on all machines and -cpu host on K8 machines with some guest Linux kernels. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: Don't spam kernel log when injecting exceptions due to bad cr writes	Avi Kivity	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit d6a23895aa82353788a1cc5a1d9a1c963465463e) These are guest-triggerable. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails	Takuya Yoshikawa	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit b7af40433870aa0636932ad39b0c48a0cb319057) svm_create_vcpu() does not free the pages allocated during the creation when it fails to complete the allocations. This patch fixes it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: VMX: Update instruction length on intercepted BP	Jan Kiszka	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit c573cd22939e54fc1b8e672054a505048987a7cb) We intercept #BP while in guest debugging mode. As VM exits due to intercepted exceptions do not necessarily come with valid idt_vectoring, we have to update event_exit_inst_len explicitly in such cases. At least in the absence of migration, this ensures that re-injections of #BP will find and use the correct instruction length. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	powerpc: Fix SMP build with disabled CPU hotplugging.	Adam Lackorzynski	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 5b72d74ce2fccca2a301de60f31b16ddf5c93984 upstream. Compiling 2.6.33 with SMP enabled and HOTPLUG_CPU disabled gives me the following link errors: LD init/built-in.o LD .tmp_vmlinux1 arch/powerpc/platforms/built-in.o: In function `.smp_xics_setup_cpu': smp.c:(.devinit.text+0x88): undefined reference to `.set_cpu_current_state' smp.c:(.devinit.text+0x94): undefined reference to `.set_default_offline_state' arch/powerpc/platforms/built-in.o: In function `.smp_pSeries_kick_cpu': smp.c:(.devinit.text+0x13c): undefined reference to `.set_preferred_offline_state' smp.c:(.devinit.text+0x148): undefined reference to `.get_cpu_current_state' smp.c:(.devinit.text+0x1a8): undefined reference to `.get_cpu_current_state' make: *** [.tmp_vmlinux1] Error 1 The following change fixes that for me and seems to work as expected. Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	perf_events, x86: Implement Intel Westmere/Nehalem-EX support	Peter Zijlstra	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	original patch commit ids: 452a339a976e7f782c786eb3f73080401e2fa3a6 and 134fbadf028a5977a1b06b0253d3ee33e6f0c642 perf_events, x86: Implement Intel Westmere support The new Intel documentation includes Westmere arch specific event maps that are significantly different from the Nehalem ones. Add support for this generation. Found the CPUID model numbers on wikipedia. Also ammend some Nehalem constraints, spotted those when looking for the differences between Nehalem and Westmere. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <20100127221122.151865645@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> perf, x86: Enable Nehalem-EX support According to Intel Software Devel Manual Volume 3B, the Nehalem-EX PMU is just like regular Nehalem (except for the uncore support, which is completely different). Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Youquan Song <youquan.song@linux.intel.com>
\| \| *	x86/PCI: irq and pci_ids patch for Intel Cougar Point DeviceIDs	Seth Heasley	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 93da6202264ce1256b04db8008a43882ae62d060 upstream. This patch adds the Intel Cougar Point (PCH) LPC and SMBus Controller DeviceIDs. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write	Avi Kivity	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 0d1622d7f526311d87d7da2ee7dd14b73e45d3fc upstream. The Intel Architecture Optimization Reference Manual states that a short load that follows a long store to the same object will suffer a store forwading penalty, particularly if the two accesses use different addresses. Trivially, a long load that follows a short store will also suffer a penalty. __downgrade_write() in rwsem incurs both penalties: the increment operation will not be able to reuse a recently-loaded rwsem value, and its result will not be reused by any recently-following rwsem operation. A comment in the code states that this is because 64-bit immediates are special and expensive; but while they are slightly special (only a single instruction allows them), they aren't expensive: a test shows that two loops, one loading a 32-bit immediate and one loading a 64-bit immediate, both take 1.5 cycles per iteration. Fix this by changing __downgrade_write to use the same add instruction on i386 and on x86_64, so that it uses the same operand size as all the other rwsem functions. Signed-off-by: Avi Kivity <avi@redhat.com> LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>