litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	[NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support	Eric Dumazet	2007-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that network timestamps use ktime_t infrastructure, we can add a new SOL_SOCKET sockopt SO_TIMESTAMPNS. This command is similar to SO_TIMESTAMP, but permits transmission of a 'timespec struct' instead of a 'timeval struct' control message. (nanosecond resolution instead of microsecond) Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are mutually exclusive. sock_recv_timestamp() became too big to be fully inlined so I added a __sock_recv_timestamp() helper function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: linux-arch@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution	Eric Dumazet	2007-04-26
\| \| \| \| \| \| \| \| \| \|	Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: div64_64 consolidate (rev3)	Stephen Hemminger	2007-04-26
\| \| \| \| \| \| \|	Here is the current version of the 64 bit divide common code. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[PATCH] Proper fix for highmem kmap_atomic functions for VMI for 2.6.21	Zachary Amsden	2007-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since lazy MMU batching mode still allows interrupts to enter, it is possible for interrupt handlers to try to use kmap_atomic, which fails when lazy mode is active, since the PTE update to highmem will be delayed. The best workaround is to issue an explicit flush in kmap_atomic_functions case; this is the only way nested PTE updates can happen in the interrupt handler. Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix. This patch gets reverted again when we start 2.6.22 and the bug gets fixed differently. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: Andi Kleen <ak@muc.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E	Andi Kleen	2007-04-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AMD dual core laptops with C1E do not run the APIC timer correctly when they go idle. Previously the code assumed this only happened on C2 or deeper. But not all of these systems report support C2. Use a AMD supplied snippet to detect C1E being enabled and then disable local apic timer use. This supercedes an earlier workaround using DMI detection of specific systems. Thanks to Mark Langsdorf for the detection snippet. Signed-off-by: Andi Kleen <ak@suse.de>
*	[PATCH] tty: minor merge correction	Alan Cox	2007-03-27
\| \| \| \| \| \| \| \| \|	Its now used.. because we added the new definitions so enabled all the goodies on i386 Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] i386: clear segment register padding in core dumps	Roland McGrath	2007-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The segment register slots in struct pt_regs are padded to 32 bits. Some of these are stored with instructions like "pushl %es", which leaves the high 16 bits as they were. So the high bits of these fields in struct pt_regs contain kernel stack garbage. These bits are ignored by everything and never leak to user space, except in core dumps. The user struct pt_regs is always at the base of the thread's kernel stack and so it seems unlikely the information that leaks from here is ever worthwhile so as to be a security concern, but I'm not sure about that. It has been this way for ages; userland consumers of core dumps all mask off these high bits themselves. So it is not urgent. This change masks off the padding bits of the segment register slots in core dumps. ptrace already masks off these high bits, so this makes the values in core dumps consistent with what ptrace would report just before the process died. As I read the processor manuals, the cs and ss values will always be padded with zero bits rather than stack garbage. But unlike "pushl %es", this is not simple to test with a userland program. So I added the two instructions rather than wonder if they are really never necessary. I think that x86_64 does not have this problem (for either 32-bit or 64-bit processes). It only uses "mov" instructions from segment registers, which zero-extend. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] i386: add command line option "local_apic_timer_c2_ok"	Thomas Gleixner	2007-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turned out that it is almost impossible to trust ACPI, BIOS & Co. regarding the C states. This was the reason to switch the local apic timer off in C2 state already. OTOH there are sane and well behaving systems, which get punished by that decision. Allow the user to confirm that the local apic timer is trustworthy in C2 state. This keeps the default behaviour on the safe side. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] i386: fix typo in sync_constant_test_bit()'s name	Jeremy Fitzhardinge	2007-03-16
\| \| \| \| \| \| \| \| \| \| \|	Fix typo in sync_constant_test_bit()'s name, so sync_bitops.h is consistent with bitops.h Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Disable NMI watchdog by default properly	Linus Torvalds	2007-03-14
\| \| \| \| \| \| \|	This reverts commit 6ebf622b2577c50b1f496bd6a5e8739e55ae7b1c and replaces it with one that actually works. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] fastcall still doesn't make sense in paravirt	Al Viro	2007-03-14
\| \| \| \| \| \| \|	Andi had removed a bunch of those, but one more had creeped in... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] Fix vmi time header bug	Zachary Amsden	2007-03-12
\| \| \| \| \| \| \|	Some gcc put this function in .init.text because the header didn't match. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] i386: make x86_64 tsc header require i386 rather than vice-versa	Andres Salomon	2007-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to commit 95492e4646e5de8b43d9a7908d6177fb737b61f0 ([PATCH] x86: rewrite SMP TSC sync code), the headers in asm-i386 did not really require anything in include/asm-x86_64. This means that distributions such as fedora did not include asm-x86_64 in kernel-devel headers for i386. Ingo's commit changed that, and broke things. This is easy enough to hack around in package builds by just including asm-x86_64 on i386, but that's kind of annoying. If anything, x86_64 should depend upon i386, not the other way around. This patch changes it so that asm-x86_64/tsc.h includes asm-i386/tsc.h, rather than vice-versa. Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] fix build with CONFIG_NO_IDLE_HZ=n	Andrew Morton	2007-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	arch/i386/kernel/vmi.c: In function 'vmi_safe_halt': arch/i386/kernel/vmi.c:262: warning: implicit declaration of function 'vmi_stop_hz_timer' arch/i386/kernel/vmi.c:266: warning: implicit declaration of function 'vmi_account_time_restart_hz_timer' Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zachary Amsden <zach@vmware.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] disable NMI watchdog by default	Ingo Molnar	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	there's a new NMI watchdog related problem: KVM crashes on certain bzImages because ... we enable the NMI watchdog by default (even if the user does not ask for it) , and no other OS on this planet does that so KVM doesnt have emulation for that yet. So KVM injects a #GP, which crashes the Linux guest: general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 EIP: 0060:[<c011a8ae>] Not tainted VLI EFLAGS: 00000246 (2.6.20-rc5-rt0 #3) EIP is at setup_apic_nmi_watchdog+0x26d/0x3d3 and no, i did /not/ request an nmi_watchdog on the boot command line! Solution: turn off that darn thing! It's a debug tool, not a 'make life harder' tool!! with this patch the KVM guest boots up just fine. And with this my laptop (Lenovo T60) also stopped its sporadic hard hanging (sometimes in acpi_init(), sometimes later during bootup, sometimes much later during actual use) as well. It hung with both nmi_watchdog=1 and nmi_watchdog=2, so it's generally the fact of NMI injection that is causing problems, not the NMI watchdog variant, nor any particular bootup code. [ NMI breaks on some systems, esp in combination with SMM -Arjan ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] vmi: apic ops	Zachary Amsden	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use para_fill instead of directly setting the APIC ops to the result of the vmi_get_function call - this allows one to implement a VMI ROM without implementing APIC functions, just using the native APIC functions. While doing this, I realized that there is a lot more cleanup that should have been done. Basically, we should never assume that the ROM implements a specific set of functions, and always allow fallback to the native implementation. This is critical for future compatibility. Signed-off-by: Anthony Liguori <anthony@codemonkey.ws> Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] vmi: pit override	Zachary Amsden	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The time_init_hook in paravirt-ops no longer functions in the correct manner after the integration of the hrtimers code. The problem is that now the call path for time initialization is: time_init : late_time_init = hpet_time_init; late_time_init -> hpet_time_init: setup_pit_timer (BAD) do_time_init --> (via paravirt.h) time_init_hook --> (via arch_hooks.h) time_init_hook (in SUBARCH/setup.c) If this isn't confusing enough, the paravirt case goes through an indirect function pointer in the paravirt-ops table. The problem is, by the time the paravirt hook is called, the pit timer is already enabled. But paravirt guests have their own timer, and don't want to use the PIT. Rather than intensify the struggle for power going on here, just make it all nice and simple and just unconditionally do all timer setup in the late_time_init hook. This also has the advantage of enabling timers in the same place in all code paths, so everyone has the same bugs and we don't have outliers who break other code because they turn on timer too early or too late. So the paravirt-ops time init function is now by default hpet_time_init, which is the time init function used for native hardware. Paravirt guests have the chance to override this when they setup the paravirt-ops table, and should need no change. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] vmi: paravirt drop udelay op	Zachary Amsden	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not respecting udelay causes problems with any virtual hardware that is passed through to real hardware. This can be noticed by any device that interacts with the real world in real time - like AP startup, which takes real time. Or keyboard LEDs, which should blink in real-time. Or floppy drives, but only when passed through to a real floppy controller on OSes which can't sufficiently buffer the floppy commands to emulate a zero latency floppy. Or IDE drives, when connecting to a physical CDROM. This was mostly a hack to get the kernel to boot faster, but it introduced a number of misvirtualization bugs, and Alan and Pavel argued pretty strongly against it. We were the only client, and now want to clean up this cruft. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] vmi: fix highpte	Zachary Amsden	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide a PT map hook for HIGHPTE kernels to designate where they are mapping page tables. This information is required so the physical address of PTE updates can be determined; otherwise, the mm layer would have to carry the physical address all the way to each PTE modification callsite, which is even more hideous that the macros required to provide the proper hooks. So lets not mess up arch neutral code to achieve this, but keep the horror in an #ifdef HIGHPTE in include/asm-i386/pgtable.h. I had to use macros here because some types are not yet defined in all the include paths for this header. This patch is absolutely required for HIGHPTE kernels to operate properly with VMI. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] vmi: cpu cycles fix	Zachary Amsden	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \|	In order to share the common code in tsc.c which does CPU Khz calibration, we need to make an accurate value of CPU speed available to the tsc.c code. This value loses a lot of precision in a VM because of the timing differences with real hardware, but we need it to be as precise as possible so the guest can make accurate time calculations with the cycle counters. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] vmi: sched clock paravirt op fix	Zachary Amsden	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The custom_sched_clock hook is broken. The result from sched_clock needs to be in nanoseconds, not in CPU cycles. The TSC is insufficient for this purpose, because TSC is poorly defined in a virtual environment, and mostly represents real world time instead of scheduled process time (which can be interrupted without notice when a virtual machine is descheduled). To make the scheduler consistent, we must expose a different nature of time, that is scheduled time. So deprecate this custom_sched_clock hack and turn it into a paravirt-op, as it should have been all along. This allows the tsc.c code which converts cycles to nanoseconds to be shared by all paravirt-ops backends. It is unfortunate to add a new paravirt-op, but this is a very distinct abstraction which is clearly different for all virtual machine implementations, and it gets rid of an ugly indirect function which I ashamedly admit I hacked in to try to get this to work earlier, and then even got in the wrong units. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] sched: remove SMT nice	Con Kolivas	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the SMT-nice feature which idles sibling cpus on SMT cpus to facilitiate nice working properly where cpu power is shared. The idling of cpus in the presence of runnable tasks is considered too fragile, easy to break with outside code, and the complexity of managing this system if an architecture comes along with many logical cores sharing cpu power will be unworkable. Remove the associated per_cpu_gain variable in sched_domains used only by this code. Also: The reason is that with dynticks enabled, this code breaks without yet further tweaks so dynticks brought on the rapid demise of this code. So either we tweak this code or kill it off entirely. It was Ingo's preference to kill it off. Either way this needs to happen for 2.6.21 since dynticks has gone in. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] io_apic.h needs apicdef.h	Jean Delvare	2007-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A -mm patch caused: In file included from drivers/pci/quirks.c:532: include/asm/io_apic.h:61: error: "MAX_IO_APICS" undeclared here (not in a function) So let's include the needed header. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq	Linus Torvalds	2007-02-26
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] constify some data tables. [CPUFREQ] constify cpufreq_driver where possible. {rd,wr}msr_on_cpu SMP=n optimization [CPUFREQ] cpufreq_ondemand.c: don't use _WORK_NAR rdmsr_on_cpu, wrmsr_on_cpu [CPUFREQ] Revert default on deprecated config X86_SPEEDSTEP_CENTRINO_ACPI
\| *	{rd,wr}msr_on_cpu SMP=n optimization	Adrian Bunk	2007-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let's save a few bytes in the CONFIG_SMP=n case. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Dave Jones <davej@redhat.com>
\| *	rdmsr_on_cpu, wrmsr_on_cpu	Alexey Dobriyan	2007-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was OpenVZ specific bug rendering some cpufreq drivers unusable on SMP. In short, when cpufreq code thinks it confined itself to needed cpu by means of set_cpus_allowed() to execute rdmsr, some "virtual cpu" feature can migrate process to anywhere. This triggers bugons and does wrong things in general. This got fixed by introducing rdmsr_on_cpu and wrmsr_on_cpu executing rdmsr and wrmsr on given physical cpu by means of smp_call_function_single(). Dave Jones mentioned cpufreq might be not only user of rdmsr_on_cpu() and wrmsr_on_cpu(), so I'm putting them into arch/{i386,x86_64}/lib/ . Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com>
* \|	Revert "[PATCH] i386: add idle notifier"	Linus Torvalds	2007-02-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 2ff2d3d74705d34ab71b21f54634fcf50d57bdd5. Uwe Bugla reports that he cannot mount a floppy drive any more, and Jiri Slaby bisected it down to this commit. Benjamin LaHaise also points out that this is a big hot-path, and that interrupt delivery while idle is very common and should not go through all these expensive gyrations. Fix up conflicts in arch/i386/kernel/apic.c and arch/i386/kernel/irq.c due to other unrelated irq changes. Cc: Stephane Eranian <eranian@hpl.hp.com> Cc: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Uwe Bugla <uwe.bugla@gmx.de> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Pull misc-for-upstream into release branch	Len Brown	2007-02-16
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/usb/misc/appledisplay.c Signed-off-by: Len Brown <len.brown@intel.com>
\| * \|	ACPI: cleanup: make disable_acpi() valid w/o CONFIG_ACPI	Rusty Russell	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Len Brown <lenb@kernel.org> said: > Okay, but better to use disable_acpi() > indeed, since this would be the first code not already inside CONFIG_ACPI > to invoke disable_acpi(), we could define the inline as empty and you could > then scratch the #ifdef too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Len Brown <len.brown@intel.com>
* \| \|	[PATCH] i386 rework local apic timer calibration	Thomas Gleixner	2007-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The local apic timer calibration has two problem cases: 1. The calibration is based on readout of the PIT/HPET timer to detect the wrap of the periodic tick. It happens that a box gets stuck in the calibration loop due to a PIT with a broken readout function. 2. CoreDuo boxen show a sporadic PIT runs too slow defect, which results in a wrong lapic calibration. The PIT goes back to normal operation once the lapic timer is switched to periodic mode. Both are existing and unfixed problems in the current upstream kernel and prevent certain laptops and other systems from booting Linux. Rework the code to address both problems: - Make the calibration interrupt driven. This removes the wait_timer_tick magic hackery from lapic.c and time_hpet.c. The clockevents framework allows easy substitution of the global tick event handler for the calibration. This is more accurate than monitoring jiffies. At this point of the boot process, nothing disturbes the interrupt delivery, so the results are very accurate. - Verify the calibration against the PM timer, when available by using the early access function. When the measured calibration period is outside of an one percent window, then the lapic timer calibration is adjusted to the pm timer result. - Verify the calibration by running the lapic timer with the calibration handler. Disable lapic timer in case of deviation. This also removes the "synchronization" of the local apic timer to the global tick. This synchronization never worked, as there is no way to synchronize PIT(HPET) and local APIC timer. The synchronization by waiting for the tick just alignes the local APIC timer for the first events, but later the events drift away due to the different clocks. Removing the "sync" is just randomizing the asynchronous behaviour at setup time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Zachary Amsden <zach@vmware.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Andi Kleen <ak@suse.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	[PATCH] clockevents: i386 drivers	Thomas Gleixner	2007-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update the timer IRQ to call into the PIT/HPET driver's event handler and the lapic-timer IRQ to call into the lapic clockevent driver. The assignement of timer functionality is delegated to the core framework code and replaces the compile and runtime evalution in do_timer_interrupt_hook() Use the clockevents broadcast support and implement the lapic_broadcast function for ACPI. No changes to existing functionality. [ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ] [ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ] Cleanups-from: Adrian Bunk <bunk@stusta.de> Build-fixes-from: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	[PATCH] i386, apic: clean up the APIC code	Thomas Gleixner	2007-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The apic code is quite unstructured and missing a lot of comments. - Restructure the code into helper functions, timer, setup/shutdown, interrupt and power management blocks. - Fixup comments. - Namespace fixups - Inline helpers for version and is_integrated - Combine the ack_bad_irq functions No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Zachary Amsden <zach@vmware.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Andi Kleen <ak@suse.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	[PATCH] Mark TSC on GeodeLX reliable	Marcelo Tosatti	2007-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Geode can safely use the TSC for highres, since: 1) Does not support frequency scaling, 2) The TSC _does_ count when the CPU is halted. Furthermore, the Geode supports a mode called "suspension on halt", where Suspend mode (which interacts with the power management states) is entered. TSC counting during suspend mode is controlled by bit 8 of the Bus Controller Configuration Register #0 (thanks Tom!). 3) no SMP :) Check if "RTSC counts during suspension" and remove the requirement for verification, so the clocksource code can safely select it as an timesource for the highres timers subsystem. Signed-off-by: Marcelo Tosatti <marcelo@kvack.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	[PATCH] x86: rewrite SMP TSC sync code	Ingo Molnar	2007-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	make the TSC synchronization code more robust, and unify it between x86_64 and i386. The biggest change is the removal of the 'fix up TSCs' code on x86_64 and i386, in some rare cases it was /causing/ time-warps on SMP systems. The new code only checks for TSC asynchronity - and if it can prove a time-warp (if it can observe the TSC going backwards when going from one CPU to another within a critical section), then the TSC clock-source is turned off. The TSC synchronization-checking code also got moved into a separate file. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	[PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.	Rusty Russell	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extern declarations belong in headers. Times, they are a'changin. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> ===================================================================
* \| \|	[PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c	Rusty Russell	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I implemented the DECLARE_PER_CPU(var) macros, I was careful that people couldn't use "var" in a non-percpu context, by prepending percpu__. I never considered that this would allow them to overload the same name for a per-cpu and a non-percpu variable. It is only one of many horrors in the i386 boot code, but let's rename the non-perpcu cpu_gdt_descr to early_gdt_descr (not boot_gdt_descr, that's something else...) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> ===================================================================
* \| \|	[PATCH] i386: Move mce_disabled to asm/mce.h	Rusty Russell	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allows external actors to disable mce. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> ===================================================================
* \| \|	[PATCH] i386: Remove fastcall in paravirt.[ch]	Andi Kleen	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not needed because fastcall is always default now Signed-off-by: Andi Kleen <ak@suse.de>
* \| \|	[PATCH] i386: improve sched_clock() on i686	Ingo Molnar	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up sched_clock() on i686: it will use the TSC if available and falls back to jiffies only if the user asked for it to be disabled via notsc or the CPU calibration code didnt figure out the right cpu_khz. This generally makes the scheduler timestamps more finegrained, on all hardware. (the current scheduler is pretty resistant against asynchronous sched_clock() values on different CPUs, it will allow at most up to a jiffy of jitter.) Also simplify sched_clock()'s check for TSC availability: propagate the desire and ability to use the TSC into the tsc_disable flag, previously this flag only indicated whether the notsc option was passed. This makes the rare low-res sched_clock() codepath a single branch off a read-mostly flag. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
* \| \|	[PATCH] i386: add idle notifier	Stephane Eranian	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a notifier mechanism to the low level idle loop. You can register a callback function which gets invoked on entry and exit from the low level idle loop. The low level idle loop is defined as the polling loop, low-power call, or the mwait instruction. Interrupts processed by the idle thread are not considered part of the low level loop. The notifier can be used to measure precisely how much is spent in useless execution (or low power mode). The perfmon subsystem uses it to turn on/off monitoring. Signed-off-by: stephane eranian <eranian@hpl.hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
* \| \|	[PATCH] i386: Profile pc badness	Zachary Amsden	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Profile_pc was broken when using paravirtualization because the assumption the kernel was running at CPL 0 was violated, causing bad logic to read a random value off the stack. The only way to be in kernel lock functions is to be in kernel code, so validate that assumption explicitly by checking the CS value. We don't want to be fooled by BIOS / APM segments and try to read those stacks, so only match KERNEL_CS. I moved some stuff in segment.h to make it prettier. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de>
* \| \|	[PATCH] i386: vMI timer patches	Zachary Amsden	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VMI timer code. It works by taking over the local APIC clock when APIC is configured, which requires a couple hooks into the APIC code. The backend timer code could be commonized into the timer infrastructure, but there are some pieces missing (stolen time, in particular), and the exact semantics of when to do accounting for NO_IDLE need to be shared between different hypervisors as well. So for now, VMI timer is a separate module. [Adrian Bunk: cleanups] Subject: VMI timer patches Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
* \| \|	[PATCH] i386: vMI backend for paravirt-ops	Zachary Amsden	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fairly straightforward implementation of VMI backend for paravirt-ops. [Adrian Bunk: some cleanups] Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
* \| \|	[PATCH] i386: SMP boot hook for paravirt	Zachary Amsden	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add VMI SMP boot hook. We emulate a regular boot sequence and use the same APIC IPI initiation, we just poke magic values to load into the CPU state when the startup IPI is received, rather than having to jump through a real mode trampoline. This is all that was needed to get SMP to work. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
* \| \|	[PATCH] i386: paravirt CPU hypercall batching mode	Zachary Amsden	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The VMI ROM has a mode where hypercalls can be queued and batched. This turns out to be a significant win during context switch, but must be done at a specific point before side effects to CPU state are visible to subsequent instructions. This is similar to the MMU batching hooks already provided. The same hooks could be used by the Xen backend to implement a context switch multicall. To explain a bit more about lazy modes in the paravirt patches, basically, the idea is that only one of lazy CPU or MMU mode can be active at any given time. Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of multiple PTE updates (say, inside a remap loop), but to avoid keeping some kind of state machine about when to flush cpu or mmu updates, we just allow one or the other to be active. Although there is no real reason a more comprehensive scheme could not be implemented, there is also no demonstrated need for this extra complexity. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
* \| \|	[PATCH] MM: page allocation hooks for VMI backend	Zachary Amsden	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The VMI backend uses explicit page type notification to track shadow page tables. The allocation of page table roots is especially tricky. We need to clone the root for non-PAE mode while it is protected under the pgd lock to correctly copy the shadow. We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as they only have 4 entries, and are cached entirely by the processor, which makes shadowing them rather simple. For base page table level allocation, pmd_populate provides the exact hook point we need. Also, we need to allocate pages when splitting a large page, and we must release pages before returning the page to any free pool. Despite being required with these slightly odd semantics for VMI, Xen also uses these hooks to determine the exact moment when page tables are created or released. AK: All nops for other architectures Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
* \| \|	[PATCH] i386: Convert i386 PDA code to use %fs	Jeremy Fitzhardinge	2007-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert the PDA code to use %fs rather than %gs as the segment for per-processor data. This is because some processors show a small but measurable performance gain for reloading a NULL segment selector (as %fs generally is in user-space) versus a non-NULL one (as %gs generally is). On modern processors the difference is very small, perhaps undetectable. Some old AMD "K6 3D+" processors are noticably slower when %fs is used rather than %gs; I have no idea why this might be, but I think they're sufficiently rare that it doesn't matter much. This patch also fixes the math emulator, which had not been adjusted to match the changed struct pt_regs. [frederik.deweerdt@gmail.com: fixit with gdb] [mingo@elte.hu: Fix KVM too] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Ian Campbell <Ian.Campbell@XenSource.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Zachary Amsden <zach@vmware.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
* \| \|	[PATCH] i386: 2048-byte command line	Alon Bar-Lev	2007-02-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current implementation allows the kernel to receive up to 255 characters from the bootloader. While the boot protocol allows greater buffers to be sent. In current environment, the command-line is used in order to specify many values, including suspend/resume, module arguments, splash, initramfs and more. 255 characters are not enough anymore. After edd issue was fixed, and dynammic kernel command-line patch was accepted, we can extend the COMMAND_LINE_SIZE without runtime memory requirements. Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	[PATCH] Numerous fixes to kernel-doc info in source files.	Robert P. J. Day	2007-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A variety of (mostly) innocuous fixes to the embedded kernel-doc content in source files, including: * make multi-line initial descriptions single line * denote some function names, constants and structs as such * change erroneous opening '/' to '/' in a few places reword some text for clarity Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	[PATCH] fix sparse warnings from {asm,net}/checksum.h	Tilman Schmidt	2007-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename the variable "sum" in the __range_ok macros to avoid name collisions causing lots of "symbol shadows an earlier one" warnings by sparse. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Andi Kleen <ak@suse.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Acked-by: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>