aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* KVM: vmx: Allow the guest to run with dirty debug registersPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | When not running in guest-debug mode (i.e. the guest controls the debug registers, having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we can let it write freely to the debug registers and reload them on the next exit. We still need to exit on the first access, so that the KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further accesses to the debug registers will not cause a vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Allow the guest to run with dirty debug registersPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | When not running in guest-debug mode, the guest controls the debug registers and having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. After this patch, VMX- and SVM-specific code can set a flag in switch_db_regs, telling vcpu_enter_guest that on the next exit the debug registers might be dirty and need to be reloaded (syncing will be taken care of by a new callback in kvm_x86_ops). This flag can be set on the first access to a debug registers, so that multiple accesses to the debug registers only cause one vmexit. Note that since the guest will be able to read debug registers and enable breakpoints in DR7, we need to ensure that they are synchronized on entry to the guest---including DR6 that was not synced before. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: change vcpu->arch.switch_db_regs to a bit maskPaolo Bonzini2014-03-11
| | | | | | | The next patch will add another bit that we can test with the same "if". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: vmx: we do rely on loading DR7 on entryPaolo Bonzini2014-03-11
| | | | | | | | Currently, this works even if the bit is not in "min", because the bit is always set in MSR_IA32_VMX_ENTRY_CTLS. Mention it for the sake of documentation, and to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Remove return code from enable_irq/nmi_windowJan Kiszka2014-03-11
| | | | | | | | | | | It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interruptJan Kiszka2014-03-11
| | | | | | | | | | | According to SDM 27.2.3, IDT vectoring information will not be valid on vmexits caused by external NMIs. So we have to avoid creating such scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we have a pending interrupt because that one would be migrated to L1's IDT vectoring info on nested exit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Fully emulate preemption timerJan Kiszka2014-03-11
| | | | | | | | | | | | | | We cannot rely on the hardware-provided preemption timer support because we are holding L2 in HLT outside non-root mode. Furthermore, emulating the preemption will resolve tick rate errata on older Intel CPUs. The emulation is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. As we no longer rely on hardware features, we can enable both the preemption timer support and value saving unconditionally. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Rework interception of IRQs and NMIsJan Kiszka2014-03-11
| | | | | | | | | | | | | | Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvm-s390-20140306' of ↵Paolo Bonzini2014-03-06
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next One fix for virtio-ccw, fixing a problem introduced with "virtio_ccw: fix vcdev pointer handling issues" and noticed just after it went into git.
| * virtio_ccw: fix hang in set offline processingHeinz Graalfs2014-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During set offline processing virtio_grab_drvdata() incorrectly calls dev_set_drvdata() to remove the virtio_ccw_device from the parent ccw_device's driver data. This is wrong and ends up in a hang during virtio_ccw_reset(), as the interrupt handler still has need of the virtio_ccw_device. A new field 'going_away' is introduced in struct virtio_ccw_device to control the usage of the ccw_device's driver data pointer in virtio_grab_drvdata(). Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
* | Merge tag 'kvm-for-3.15-1' of ↵Paolo Bonzini2014-03-04
|\ \ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next
| * | ARM: KVM: fix warning in mmu.cMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiling with THP enabled leads to the following warning: arch/arm/kvm/mmu.c: In function ‘unmap_range’: arch/arm/kvm/mmu.c:177:39: warning: ‘pte’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (kvm_pmd_huge(*pmd) || page_empty(pte)) { ^ Code inspection reveals that these two cases are mutually exclusive, so GCC is a bit overzealous here. Silence it anyway by initializing pte to NULL and testing it later on. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | ARM: KVM: trap VM system registers until MMU and caches are ONMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to detect the point where the guest enables its MMU and caches, trap all the VM related system registers. Once we see the guest enabling both the MMU and the caches, we can go back to a saner mode of operation, which is to leave these registers in complete control of the guest. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | ARM: KVM: add world-switch for AMAIR{0,1}Marc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HCR.TVM traps (among other things) accesses to AMAIR0 and AMAIR1. In order to minimise the amount of surprise a guest could generate by trying to access these registers with caches off, add them to the list of registers we switch/handle. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| * | ARM: KVM: introduce per-vcpu HYP Configuration RegisterMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, KVM/ARM used a fixed HCR configuration per guest, except for the VI/VF/VA bits to control the interrupt in absence of VGIC. With the upcoming need to dynamically reconfigure trapping, it becomes necessary to allow the HCR to be changed on a per-vcpu basis. The fix here is to mimic what KVM/arm64 already does: a per vcpu HCR field, initialized at setup time. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| * | ARM: KVM: fix ordering of 64bit coprocessor accessesMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 240e99cbd00a (ARM: KVM: Fix 64-bit coprocessor handling) added an ordering dependency for the 64bit registers. The order described is: CRn, CRm, Op1, Op2, 64bit-first. Unfortunately, the implementation is: CRn, 64bit-first, CRm... Move the 64bit test to be last in order to match the documentation. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| * | ARM: KVM: fix handling of trapped 64bit coprocessor accessesMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 240e99cbd00a (ARM: KVM: Fix 64-bit coprocessor handling) changed the way we match the 64bit coprocessor access from user space, but didn't update the trap handler for the same set of registers. The effect is that a trapped 64bit access is never matched, leading to a fault being injected into the guest. This went unnoticed as we didn't really trap any 64bit register so far. Placing the CRm field of the access into the CRn field of the matching structure fixes the problem. Also update the debug feature to emit the expected string in case of failing match. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| * | ARM: KVM: force cache clean on page fault when caches are offMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order for a guest with caches disabled to observe data written contained in a given page, we need to make sure that page is committed to memory, and not just hanging in the cache (as guest accesses are completely bypassing the cache until it decides to enable it). For this purpose, hook into the coherent_cache_guest_page function and flush the region if the guest SCTLR register doesn't show the MMU and caches as being enabled. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
| * | arm64: KVM: flush VM pages before letting the guest enable cachesMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the guest runs with caches disabled (like in an early boot sequence, for example), all the writes are diectly going to RAM, bypassing the caches altogether. Once the MMU and caches are enabled, whatever sits in the cache becomes suddenly visible, which isn't what the guest expects. A way to avoid this potential disaster is to invalidate the cache when the MMU is being turned on. For this, we hook into the SCTLR_EL1 trapping code, and scan the stage-2 page tables, invalidating the pages/sections that have already been mapped in. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | ARM: KVM: introduce kvm_p*d_addr_endMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of p*d_addr_end with stage-2 translation is slightly dodgy, as the IPA is 40bits, while all the p*d_addr_end helpers are taking an unsigned long (arm64 is fine with that as unligned long is 64bit). The fix is to introduce 64bit clean versions of the same helpers, and use them in the stage-2 page table code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | arm64: KVM: trap VM system registers until MMU and caches are ONMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to detect the point where the guest enables its MMU and caches, trap all the VM related system registers. Once we see the guest enabling both the MMU and the caches, we can go back to a saner mode of operation, which is to leave these registers in complete control of the guest. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | arm64: KVM: allows discrimination of AArch32 sysreg accessMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current handling of AArch32 trapping is slightly less than perfect, as it is not possible (from a handler point of view) to distinguish it from an AArch64 access, nor to tell a 32bit from a 64bit access either. Fix this by introducing two additional flags: - is_aarch32: true if the access was made in AArch32 mode - is_32bit: true if is_aarch32 == true and a MCR/MRC instruction was used to perform the access (as opposed to MCRR/MRRC). This allows a handler to cover all the possible conditions in which a system register gets trapped. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | arm64: KVM: force cache clean on page fault when caches are offMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order for the guest with caches off to observe data written contained in a given page, we need to make sure that page is committed to memory, and not just hanging in the cache (as guest accesses are completely bypassing the cache until it decides to enable it). For this purpose, hook into the coherent_icache_guest_page function and flush the region if the guest SCTLR_EL1 register doesn't show the MMU and caches as being enabled. The function also get renamed to coherent_cache_guest_page. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | kvm, vmx: Really fix lazy FPU on nested guestPaolo Bonzini2014-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e504c9098ed6 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13) highlighted a real problem, but the fix was subtly wrong. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. In other words, L2 might think that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually running it with TS=1, we should inject the fault into L1. The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use it. Fixes: e504c9098ed6acd9e1079c5e10e4910724ad429f Reported-by: Kashyap Chamarty <kchamart@redhat.com> Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Kashyap Chamarty <kchamart@redhat.com> Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | kvm: x86: fix emulator buffer overflow (CVE-2014-0049)Andrew Honig2014-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: f78146b0f9230765c6315b2e14f56112513389ad Signed-off-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org (3.5+) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | arm/arm64: KVM: detect CPU reset on CPU_PM_EXITMarc Zyngier2014-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1fcf7ce0c602 (arm: kvm: implement CPU PM notifier) added support for CPU power-management, using a cpu_notifier to re-init KVM on a CPU that entered CPU idle. The code assumed that a CPU entering idle would actually be powered off, loosing its state entierely, and would then need to be reinitialized. It turns out that this is not always the case, and some HW performs CPU PM without actually killing the core. In this case, we try to reinitialize KVM while it is still live. It ends up badly, as reported by Andre Przywara (using a Calxeda Midway): [ 3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760 [ 3.663897] unexpected data abort in Hyp mode at: 0xc067d150 [ 3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0 The trick here is to detect if we've been through a full re-init or not by looking at HVBAR (VBAR_EL2 on arm64). This involves implementing the backend for __hyp_get_vectors in the main KVM HYP code (rather small), and checking the return value against the default one when the CPU notifier is called on CPU_PM_EXIT. Reported-by: Andre Przywara <osp@andrep.de> Tested-by: Andre Przywara <osp@andrep.de> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Rob Herring <rob.herring@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: drop read-only large sptes when creating lower level sptesMarcelo Tosatti2014-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Read-only large sptes can be created due to read-only faults as follows: - QEMU pagetable entry that maps guest memory is read-only due to COW. - Guest read faults such memory, COW is not broken, because it is a read-only fault. - Enable dirty logging, large spte not nuked because it is read-only. - Write-fault on such memory causes guest to loop endlessly (which must go down to level 1 because dirty logging is enabled). Fix by dropping large spte when necessary. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Linux 3.14-rc3Linus Torvalds2014-02-16
| | |
| * | Merge branch 'for-linus' of ↵Linus Torvalds2014-02-16
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have a small collection of fixes in my for-linus branch. The big thing that stands out is a revert of a new ioctl. Users haven't shipped yet in btrfs-progs, and Dave Sterba found a better way to export the information" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: use right clone root offset for compressed extents btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105 Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol Btrfs: fix max_inline mount option Btrfs: fix a lockdep warning when cleaning up aborted transaction Revert "btrfs: add ioctl to export size of global metadata reservation"
| | * | Btrfs: use right clone root offset for compressed extentsFilipe David Borba Manana2014-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For non compressed extents, iterate_extent_inodes() gives us offsets that take into account the data offset from the file extent items, while for compressed extents it doesn't. Therefore we have to adjust them before placing them in a send clone instruction. Not doing this adjustment leads to the receiving end requesting for a wrong a file range to the clone ioctl, which results in different file content from the one in the original send root. Issue reproducible with the following excerpt from the test I made for xfstests: _scratch_mkfs _scratch_mount "-o compress-force=lzo" $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 $XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT $XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo $XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo # will be used for incremental send to be able to issue clone operations $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2 $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \ -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2 $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \ -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap _scratch_unmount _scratch_mkfs _scratch_mount $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
| | * | btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105Anand Jain2014-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdev is null when disk has disappeared and mounted with the degrade option stack trace --------- btrfs_sysfs_add_one+0x105/0x1c0 [btrfs] open_ctree+0x15f3/0x1fe0 [btrfs] btrfs_mount+0x5db/0x790 [btrfs] ? alloc_pages_current+0xa4/0x160 mount_fs+0x34/0x1b0 vfs_kern_mount+0x62/0xf0 do_mount+0x22e/0xa80 ? __get_free_pages+0x9/0x40 ? copy_mount_options+0x31/0x170 SyS_mount+0x7e/0xc0 system_call_fastpath+0x16/0x1b --------- reproducer: ------- mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd (detach a disk) devmgt detach /dev/sdc [1] mount -o degrade /dev/sdd /btrfs ------- [1] github.com/anajain/devmgt.git Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Tested-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| | * | Btrfs: unset DCACHE_DISCONNECTED when mounting default subvolJosef Bacik2014-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user was running into errors from an NFS export of a subvolume that had a default subvol set. When we mount a default subvol we will use d_obtain_alias() to find an existing dentry for the subvolume in the case that the root subvol has already been mounted, or a dummy one is allocated in the case that the root subvol has not already been mounted. This allows us to connect the dentry later on if we wander into the path. However if we don't ever wander into the path we will keep DCACHE_DISCONNECTED set for a long time, which angers NFS. It doesn't appear to cause any problems but it is annoying nonetheless, so simply unset DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to use d_materialise_unique() instead which will make everything play nicely together and reconnect stuff if we wander into the defaul subvol path from a different way. With this patch I'm no longer getting the NFS errors when exporting a volume that has been mounted with a default subvol set. Thanks, cc: bfields@fieldses.org cc: ebiederm@xmission.com Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris Mason <clm@fb.com>
| | * | Btrfs: fix max_inline mount optionMitch Harder2014-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only mount option for max_inline that has any effect is max_inline=0. Any other value that is supplied to max_inline will be adjusted to a minimum of 4k. Since max_inline has an effective maximum of ~3900 bytes due to page size limitations, the current behaviour only has meaning for max_inline=0. This patch will allow the the max_inline mount option to accept non-zero values as indicated in the documentation. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <clm@fb.com>
| | * | Btrfs: fix a lockdep warning when cleaning up aborted transactionLiu Bo2014-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given now we have 2 spinlock for management of delayed refs, CONFIG_DEBUG_SPINLOCK=y helped me find this, [ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258 [ 4723.414882] lock: 0xffff880048377670, .magic: dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2 [ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G W O 3.12.0+ #4 [ 4723.421321] Call Trace: [ 4723.421872] [<ffffffff81680fe7>] dump_stack+0x54/0x74 [ 4723.422753] [<ffffffff81681093>] spin_dump+0x8c/0x91 [ 4723.424979] [<ffffffff816810b9>] spin_bug+0x21/0x26 [ 4723.425846] [<ffffffff81323956>] do_raw_spin_unlock+0x66/0x90 [ 4723.434424] [<ffffffff81689bf7>] _raw_spin_unlock+0x27/0x40 [ 4723.438747] [<ffffffffa015da9e>] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs] [ 4723.443321] [<ffffffffa015df54>] btrfs_cleanup_transaction+0x104/0x570 [btrfs] [ 4723.444692] [<ffffffff810c1b5d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 4723.450336] [<ffffffff810c1c2d>] ? trace_hardirqs_on+0xd/0x10 [ 4723.451332] [<ffffffffa015e5ee>] transaction_kthread+0x22e/0x270 [btrfs] [ 4723.452543] [<ffffffffa015e3c0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs] [ 4723.457833] [<ffffffff81079efa>] kthread+0xea/0xf0 [ 4723.458990] [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140 [ 4723.460133] [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0 [ 4723.460865] [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140 [ 4723.496521] ------------[ cut here ]------------ ---------------------------------------------------------------------- The reason is that we get to call cond_resched_lock(&head_ref->lock) while still holding @delayed_refs->lock. So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire dance before and after actually processing delayed refs. Here we don't drop the lock, others are not able to add new delayed refs to head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
| | * | Revert "btrfs: add ioctl to export size of global metadata reservation"Chris Mason2014-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 01e219e8069516cdb98594d417b8bb8d906ed30d. David Sterba found a different way to provide these features without adding a new ioctl. We haven't released any progs with this ioctl yet, so I'm taking this out for now until we finalize things. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David Sterba <dsterba@suse.cz> CC: Jeff Mahoney <jeffm@suse.com>
| * | | Merge tag 'dt-fixes-for-3.14' of ↵Linus Torvalds2014-02-16
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: "Fix booting on PPC boards. Changes to of_match_node matching caused the serial port on some PPC boards to stop working. Reverted the change and reimplement to split matching between new style compatible only matching and fallback to old matching algorithm" * tag 'dt-fixes-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: search the best compatible match first in __of_match_node() Revert "OF: base: match each node compatible against all given matches first"
| | * | | of: search the best compatible match first in __of_match_node()Kevin Hao2014-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, of_match_node compares each given match against all node's compatible strings with of_device_is_compatible. To achieve multiple compatible strings per node with ordering from specific to generic, this requires given matches to be ordered from specific to generic. For most of the drivers this is not true and also an alphabetical ordering is more sane there. Therefore, this patch introduces a function to match each of the node's compatible strings against all given compatible matches without type and name first, before checking the next compatible string. This implies that node's compatibles are ordered from specific to generic while given matches can be in any order. If we fail to find such a match entry, then fall-back to the old method in order to keep compatibility. Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: Kevin Hao <haokexin@gmail.com> Tested-by: Stephen Chivers <schivers@csc.com> Signed-off-by: Rob Herring <robh@kernel.org>
| | * | | Revert "OF: base: match each node compatible against all given matches first"Kevin Hao2014-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 105353145eafb3ea919f5cdeb652a9d8f270228e. Stephen Chivers reported this is broken as we will get a match entry '.type = "serial"' instead of the '.compatible = "ns16550"' in the following scenario: serial0: serial@4500 { compatible = "fsl,ns16550", "ns16550"; } struct of_device_id of_platform_serial_table[] = { { .compatible = "ns8250", .data = (void *)PORT_8250, }, { .compatible = "ns16450", .data = (void *)PORT_16450, }, { .compatible = "ns16550a", .data = (void *)PORT_16550A, }, { .compatible = "ns16550", .data = (void *)PORT_16550, }, { .compatible = "ns16750", .data = (void *)PORT_16750, }, { .compatible = "ns16850", .data = (void *)PORT_16850, }, ... { .type = "serial", .data = (void *)PORT_UNKNOWN, }, { /* end of list */ }, }; So just revert this patch, we will use another implementation to find the best compatible match in a follow-on patch. Reported-by: Stephen N Chivers <schivers@csc.com.au> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds2014-02-15
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI target fixes from Nicholas Bellinger: "Mostly minor fixes this time to v3.14-rc1 related changes. Also included is one fix for a free after use regression in persistent reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: Target/sbc: Fix protection copy routine IB/srpt: replace strict_strtoul() with kstrtoul() target: Simplify command completion by removing CMD_T_FAILED flag iser-target: Fix leak on failure in isert_conn_create_fastreg_pool iscsi-target: Fix SNACK Type 1 + BegRun=0 handling target: Fix missing length check in spc_emulate_evpd_83() qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div target: Fix free-after-use regression in PR unregister
| | * | | | Target/sbc: Fix protection copy routineSagi Grimberg2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Need to take into account that protection sg_list (copy-buffer) may consist of multiple entries. Changes from v0: - Changed commit description Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | IB/srpt: replace strict_strtoul() with kstrtoul()Jingoo Han2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The usage of strict_strtoul() is not preferred, because strict_strtoul() is obsolete. Thus, kstrtoul() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | target: Simplify command completion by removing CMD_T_FAILED flagRoland Dreier2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CMD_T_FAILED flag is set used in one place to record the result of a trivial test, and it is only tested once, few lines later. We might as well make the code simpler and easier to read by directly doing the test of "success" where we want to use it. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | iser-target: Fix leak on failure in isert_conn_create_fastreg_poolNicholas Bellinger2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a memory leak for fr_desc upon failure of isert_create_fr_desc() in isert_conn_create_fastreg_pool() code. As reported by Coverity 1166659: *** CID 1166659: Resource leak (RESOURCE_LEAK) /drivers/infiniband/ulp/isert/ib_isert.c: 470 in isert_conn_create_fastreg_pool() 464 isert_conn, isert_conn->conn_fr_pool_size); 465 466 return 0; 467 468 err: 469 isert_conn_free_fastreg_pool(isert_conn); >>> CID 1166659: Resource leak (RESOURCE_LEAK) >>> Variable "fr_desc" going out of scope leaks the storage it points to. 470 return ret; 471 } 472 473 static int 474 isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) 475 { Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | iscsi-target: Fix SNACK Type 1 + BegRun=0 handlingNicholas Bellinger2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes Status SNACK handling of BegRun=0 to allow for all unacknowledged respones to be resent, instead of always assuming that BegRun would be an explicit value less than the current ExpStatSN. Reported-by: santosh kulkarni <santosh.kulkarni@calsoftinc.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | target: Fix missing length check in spc_emulate_evpd_83()Roland Dreier2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit fbfe858fea2a ("target_core_spc: Include target device descriptor in VPD page 83") added a new length variable, but (due to a cut and paste mistake?) just checks scsi_name_len against 256 twice. Fix this to check scsi_target_len for overflow too. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_listRoland Dreier2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only place this struct member is touched is in one INIT_LIST_HEAD. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_divNicholas Bellinger2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes core_alua_state_lba_dependent() to use do_div() instead sector_div() to avoid the following link error on 32-bit with CONFIG_LBDAF=n as reported by Jim: buildlog-1391099072.txt-drivers/built-in.o: In function `target_alua_state_check': buildlog-1391099072.txt-(.text+0x928d93): undefined reference to `__umoddi3' buildlog-1391099072.txt:make: *** [vmlinux] Error 1 -- buildlog-1391101753.txt- CC init/version.o buildlog-1391101753.txt- LD init/built-in.o buildlog-1391101753.txt-drivers/built-in.o: In function `core_alua_state_lba_dependent': buildlog-1391101753.txt-/home/jim/linux/drivers/target/target_core_alua.c:503: undefined reference to `__umoddi3' buildlog-1391101753.txt:make: *** [vmlinux] Error 1 Reported-by: Jim Davis <jim.epost@gmail.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| | * | | | target: Fix free-after-use regression in PR unregisterNicholas Bellinger2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses a >= v3.11 free-after-use regression in core_scsi3_emulate_pro_register() that was introduced in the following commit: commit bc118fe4c4a8cfa453491ba77c0a146a6d0e73e0 Author: Andy Grover <agrover@redhat.com> Date: Thu May 16 10:41:04 2013 -0700 target: Further refactoring of core_scsi3_emulate_pro_register() To avoid the free-after-use, save an type value before hand, and only call core_scsi3_put_pr_reg() with a valid *pr_reg. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Andy Grover <agrover@redhat.com> Cc: <stable@vger.kernel.org> #3.11+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| * | | | | Merge branch 'i2c/for-current' of ↵Linus Torvalds2014-02-15
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "i2c has a bugfix and documentation improvements for you" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Documentation: i2c: mention ACPI method for instantiating devices Documentation: i2c: describe devicetree method for instantiating devices i2c: mv64xxx: refactor message start to ensure proper initialization
| | * | | | | Documentation: i2c: mention ACPI method for instantiating devicesWolfram Sang2014-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Guenter Roeck <linux@roeck-us.net>