aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* KVM: MMU: Fix hugepage pdes mapping same physical address with different accessAvi Kivity2007-05-03
| | | | | | | | | | | | | | | | | | | The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map the same physical address, they share the same shadow page. This is a fairly common case (kernel mappings on i386 nonpae Linux, for example). However, if the two pdes map the same memory but with different permissions, kvm will happily use the cached shadow page. If the access through the more permissive pde will occur after the access to the strict pde, an endless pagefault loop will be generated and the guest will make no progress. Fix by making the access permissions part of the cache lookup key. The fix allows Xen pae to boot on kvm and run guest domains. Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: forbid guest to execute monitor/mwaitJoerg Roedel2007-05-03
| | | | | | | | | | This patch forbids the guest to execute monitor/mwait instructions on SVM. This is necessary because the guest can execute these instructions if they are available even if the kvm cpuid doesn't report its existence. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Handle writes to MCG_STATUS msrSergey Kiselev2007-05-03
| | | | | | | | | | Some older (~2.6.7) kernels write MCG_STATUS register during kernel boot (mce_clear_all() function, called from mce_init()). It's not currently handled by kvm and will cause it to inject a GPF. Following patch adds a "nop" handler for this. Signed-off-by: Sergey Kiselev <sergey.kiselev@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove unused and write-only variablesAvi Kivity2007-05-03
| | | | | | Trivial cleanup. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Don't allow the guest to turn off the cpu cacheAvi Kivity2007-05-03
| | | | | | | The cpu cache is a host resource; the guest should not be able to turn it off (even for itself). Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Hack real-mode segments on vmx from KVM_SET_SREGSAvi Kivity2007-05-03
| | | | | | | | | | As usual, we need to mangle segment registers when emulating real mode as vm86 has specific constraints. We special case the reset segment base, and set the "access rights" (or descriptor flags) to vm86 comaptible values. This fixes reboot on vmx. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Modify guest segments after potentially switching modesAvi Kivity2007-05-03
| | | | | | | | | The SET_SREGS ioctl modifies both cr0.pe (real mode/protected mode) and guest segment registers. Since segment handling is modified by the mode on Intel procesors, update the segment registers after the mode switch has taken place. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove set_cr0_no_modeswitch() arch opAvi Kivity2007-05-03
| | | | | | | | | set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers. As we now cache the protected mode values on entry to real mode, this isn't an issue anymore, and it interferes with reboot (which usually _is_ a modeswitch). Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Workaround vmx inability to virtualize the reset stateAvi Kivity2007-05-03
| | | | | | | | | | | | | | The reset state has cs.selector == 0xf000 and cs.base == 0xffff0000, which aren't compatible with vm86 mode, which is used for real mode virtualization. When we create a vcpu, we set cs.base to 0xf0000, but if we get there by way of a reset, the values are inconsistent and vmx refuses to enter guest mode. Workaround by detecting the state and munging it appropriately. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Remove global pte trackingAvi Kivity2007-05-03
| | | | | | | | | | | The initial, noncaching, version of the kvm mmu flushed the all nonglobal shadow page table translations (much like a native tlb flush). The new implementation flushes translations only when they change, rendering global pte tracking superfluous. This removes the unused tracking mechanism and storage space. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Remove unnecessary check for pdptr accessAvi Kivity2007-05-03
| | | | | | We already special case the pdptr access, so no need to check it again. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Avoid guest virtual addresses in string pio userspace interfaceAvi Kivity2007-05-03
| | | | | | | | | | | | | | | The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Future-proof argument-less ioctlsAvi Kivity2007-05-03
| | | | | | | Some ioctls ignore their arguments. By requiring them to be zero now, we allow a nonzero value to have some special meaning in the future. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow kernel to select size of mmap() bufferAvi Kivity2007-05-03
| | | | | | | | This allows us to store offsets in the kernel/user kvm_run area, and be sure that userspace has them mapped. As offsets can be outside the kvm_run struct, userspace has no way of knowing how much to mmap. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add guest mode signal maskAvi Kivity2007-05-03
| | | | | | | | | Allow a special signal mask to be used while executing in guest mode. This allows signals to be used to interrupt a vcpu without requiring signal delivery to a userspace handler, which is quite expensive. Userspace still receives -EINTR and can get the signal via sigwait(). Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Initialize the apic_base msr on svm tooAvi Kivity2007-05-03
| | | | | | | Older userspace didn't care, but newer userspace (with the cpuid changes) does. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add a special exit reason when exiting due to an interruptAvi Kivity2007-05-03
| | | | | | | | This is redundant, as we also return -EINTR from the ioctl, but it allows us to examine the exit_reason field on resume without seeing old data. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fold kvm_run::exit_type into kvm_run::exit_reasonAvi Kivity2007-05-03
| | | | | | | | | Currently, userspace is told about the nature of the last exit from the guest using two fields, exit_type and exit_reason, where exit_type has just two enumerations (and no need for more). So fold exit_type into exit_reason, reducing the complexity of determining what really happened. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow userspace to process hypercalls which have no kernel handlerAvi Kivity2007-05-03
| | | | | | This is useful for paravirtualized graphics devices, for example. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add method to check for backwards-compatible API extensionsAvi Kivity2007-05-03
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Renumber ioctlsAvi Kivity2007-05-03
| | | | | | The recent changes have left the ioctl numbers in complete disarray. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove minor wart from KVM_CREATE_VCPU ioctlAvi Kivity2007-05-03
| | | | | | | That ioctl does not transfer any data, so it should be an _IO rather than an _IOW. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove the 'emulated' field from the userspace interfaceAvi Kivity2007-05-03
| | | | | | | We no longer emulate single instructions in userspace. Instead, we service mmio or pio requests. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Handle cpuid in the kernel instead of punting to userspaceAvi Kivity2007-05-03
| | | | | | | | | | | | | | KVM used to handle cpuid by letting userspace decide what values to return to the guest. We now handle cpuid completely in the kernel. We still let userspace decide which values the guest will see by having userspace set up the value table beforehand (this is necessary to allow management software to set the cpu features to the least common denominator, so that live migration can work). The motivation for the change is that kvm kernel code can be impacted by cpuid features, for example the x86 emulator. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Do not communicate to userspace through cpu registers during PIOAvi Kivity2007-05-03
| | | | | | | | | | | | | Currently when passing the a PIO emulation request to userspace, we rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx (on string instructions). This (a) requires two extra ioctls for getting and setting the registers and (b) is unfriendly to non-x86 archs, when they get kvm ports. So fix by doing the register fixups in the kernel and passing to userspace only an abstract description of the PIO to be done. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use a shared page for kernel/user communication when runing a vcpuAvi Kivity2007-05-03
| | | | | | | | | Instead of passing a 'struct kvm_run' back and forth between the kernel and userspace, allocate a page and allow the user to mmap() it. This reduces needless copying and makes the interface expandable by providing lots of free space. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix bogus sign extension in mmu mapping auditAvi Kivity2007-05-03
| | | | | | | | | | When auditing a 32-bit guest on a 64-bit host, sign extension of the page table directory pointer table index caused bogus addresses to be shown on audit errors. Fix by declaring the index unsigned. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Export <linux/kvm.h>Avi Kivity2007-05-03
| | | | | | | This allows users to actually build prgrams that use kvm without the entire source tree. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use own minor numberAvi Kivity2007-05-03
| | | | | | Use the minor number (232) allocated to kvm by lanana. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use the generic skip_emulated_instruction() in hypercall codeDor Laor2007-05-03
| | | | | | | | Instead of twiddling the rip registers directly, use the skip_emulated_instruction() function to do that for us. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix guest register corruption on paravirt hypercallDor Laor2007-05-03
| | | | | | | | The hypercall code mixes up the ->cache_regs() and ->decache_regs() callbacks, resulting in guest register corruption. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* libata: honour host controllers that want just one hostLinus Torvalds2007-04-30
| | | | | | | | | | | | | | | | | | | | The Marvell IDE interface on my machine would hit a BUG_ON() in lib/iomem.c because it was calling ata_pci_init_one() specifying just a single port on the host, but that would actually end up trying to initialize two ports, the second one with bogus information. This fixes "ata_pci_init_one()" so that it actually passes down the n_ports variable that it got from the low-level driver to the host allocation routine ("ata_host_alloc_pinfo()"), which results in the ATA layer actually having the correct port number information. And in order to make it all work, I also needed to fix a few places that had incorrectly hard-coded the fact that a host always had exactly two ports (both ata_pci_init_bmdma() and ata_request_legacy_irqs() would just always iterate over both ports). Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pm: include EIO from errno-base.hDavid Rientjes2007-04-30
| | | | | | | | | | | For backwards compatibility, call_platform_enable_wakeup() can return 0 instead of -EIO since we aren't guaranteed to have errno defined. Cc: David Brownell <david-b@pacbell.net> Signed-off-by: David Rientjes <rientjes@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add kvasprintf()Jeremy Fitzhardinge2007-04-30
| | | | | | | | | | | | | Add a kvasprintf() function to complement kasprintf(). No in-tree users yet, but I have some coming up. [akpm@linux-foundation.org: EXPORT it] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Keir Fraser <keir@xensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* power management: force pm_ops.valid callback to be assignedJohannes Berg2007-04-30
| | | | | | | | | | | | | | | | | | This patch changes the docs and behaviour from "all states valid" to "no states valid" if no .valid callback is assigned. Users of pm_ops that only need mem sleep can assign pm_valid_only_mem without any overhead, others will require more elaborate callbacks. Now that all users of pm_ops have a .valid callback this is a safe thing to do and prevents things from getting messy again as they were before. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl> Cc: <linux-pm@lists.linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* power management: implement pm_ops.valid for everybodyJohannes Berg2007-04-30
| | | | | | | | | | | | | | | | | | | | | | Almost all users of pm_ops only support mem sleep, don't check in .valid and don't reject any others in .prepare so users can be confused if they check /sys/power/state, especially when new states are added (these would then result in s-t-r although they're supposed to be something different). This patch implements a generic pm_valid_only_mem function that is then exported for users and puts it to use in almost all existing pm_ops. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: linux-pm@lists.linux-foundation.org Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* power management: remove firmware disk modeJohannes Berg2007-04-30
| | | | | | | | | | | | | | | | | | This patch removes the firmware disk suspend mode which is the wrong approach, it is supposed to be used for implementing firmware-based disk suspend but cannot actually be used for that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: David Brownell <david-b@pacbell.net> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rework pm_ops pm_disk_mode, kill misuseJohannes Berg2007-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch series cleans up some misconceptions about pm_ops. Some users of the pm_ops structure attempt to use it to stop the user from entering suspend to disk, this, however, is not possible since the user can always use "shutdown" in /sys/power/disk and then the pm_ops are never invoked. Also, platforms that don't support suspend to disk simply should not allow configuring SOFTWARE_SUSPEND (read the help text on it, it only selects suspend to disk and nothing else, all the other stuff depends on PM). The pm_ops structure is actually intended to provide a way to enter platform-defined sleep states (currently supported states are "standby" and "mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured) allows a platform to support a platform specific way to enter low-power mode once everything has been saved to disk. This is currently only used by ACPI (S4). This patch: The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really seems to understand what it actually does. This patch clarifies the pm_disk_mode description. It also removes all the arm and sh users that think they can veto suspend to disk via pm_ops; not so since the user can always do echo shutdown > /sys/power/disk, they need to find a better way involving Kconfig or such. ACPI is the only user left with a non-zero pm_disk_mode. The patch also sets the default mode to shutdown again, but when a new pm_ops is registered its pm_disk_mode is selected as default, that way the default stays for ACPI where it is apparently required. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* reiserfs: suppress lockdep warningJeff Mahoney2007-04-30
| | | | | | | | | | | | | We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix. The xattr_sem can never be taken in the manner described. Internal inodes are protected by I_PRIVATE. Add the appropriate annotation. Cc: <stable@kernel.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Extend print_symbol capabilityRobert Peterson2007-04-30
| | | | | | | | | | | | | | | | | | | | | Today's print_symbol function dumps a kernel symbol with printk. This patch extends the functionality of kallsyms.c so that the symbol lookup function may be used without the printk. This is useful for modules that want to dump symbols elsewhere, for example, to debugfs. I intend to use the new function call in the GFS2 file system (which will be a separate patch). [akpm@linux-foundation.org: build fix] [clameter@sgi.com: sprint_symbol should return length of string like sprintf] Signed-off-by: Robert Peterson <rpeterso@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [UDP]: Do not allow specific bind when wildcard bind exists.David S. Miller2007-04-30
| | | | | | | | | | When allocating local ports, do not allow a bind to a port with a specific local address when a bind to that port with a wildcard local address already exists. Noticed by Linus. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] UDP: Fix endianness bugs in hashing changes.David S. Miller2007-04-30
| | | | | | | | I accidently applied an earlier version of Eric Dumazet's patch, from March 21st. His version from March 30th didn't have these bugs, so this just interdiffs to the correct patch. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2007-04-30
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (56 commits) ieee1394: remove garbage from Kconfig ieee1394: more help in Kconfig ieee1394: ohci1394: Fix mistake in printk message. ieee1394: ohci1394: remove unnecessary rcvPhyPkt bit flipping in LinkControl register ieee1394: ohci1394: fix cosmetic problem in error logging ieee1394: eth1394: send async streams at S100 on 1394b buses ieee1394: eth1394: fix error path in module_init ieee1394: eth1394: correct return codes in hard_start_xmit ieee1394: eth1394: hard_start_xmit is called in atomic context ieee1394: eth1394: some conditions are unlikely ieee1394: eth1394: clean up fragment_overlap ieee1394: eth1394: don't use alloc_etherdev ieee1394: eth1394: omit useless set_mac_address callback ieee1394: eth1394: CONFIG_INET is always defined ieee1394: eth1394: allow MTU bigger than 1500 ieee1394: unexport highlevel_host_reset ieee1394: eth1394: contain host reset ieee1394: eth1394: shorter error messages ieee1394: eth1394: correct a memset argument ieee1394: eth1394: refactor .probe and .update ...
| * ieee1394: remove garbage from KconfigStefan Richter2007-04-29
| | | | | | | | Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * ieee1394: more help in KconfigStefan Richter2007-04-29
| | | | | | | | | | | | | | | | - s/Device Drivers/Controllers/ - clarify who needs pcilynx - don't recommend Y for raw1394; M is typically used Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * ieee1394: ohci1394: Fix mistake in printk message.Simon Arlott2007-04-29
| | | | | | | | | | | | | | | | Fix the "attempting to setting" message in ohci1394. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * ieee1394: ohci1394: remove unnecessary rcvPhyPkt bit flipping in LinkControl ↵Bernhard Kauer2007-04-29
| | | | | | | | | | | | | | | | | | | | register Remove the unneeded code that clears, sets and again clears the rcvPhyPkt bit in the ohci1394 LinkControl register in ohci_initialize(). Signed-off-by: Bernhard Kauer <kauer@os.inf.tu-dresden.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * ieee1394: ohci1394: fix cosmetic problem in error loggingStefan Richter2007-04-29
| | | | | | | | | | | | | | If posted write failed, an "Unhandled interrupt(s) 0x00000100" message was logged by mistake. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * ieee1394: eth1394: send async streams at S100 on 1394b busesStefan Richter2007-04-29
| | | | | | | | | | | | | | eth1394 did not work on buses consisting of S100B...S400B hardware because it attempted to send GASP packets at S800. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * ieee1394: eth1394: fix error path in module_initAkinobu Mita2007-04-29
| | | | | | | | | | | | | | | | | | | | This patch fixes some error handlings in eth1394: - check return value of kmem_cache_create() - cleanup resources if hpsb_register_protocol() fails Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (whitespace)