aboutsummaryrefslogtreecommitdiffstats
path: root/arch
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2015-06-22 20:59:09 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2015-06-22 20:59:09 -0400
commitd70b3ef54ceaf1c7c92209f5a662a670d04cbed9 (patch)
tree0f38109c1cabe9e2df028041c1e30f36c803ec5b /arch
parent650ec5a6bd5df4ab0c9ef38d05b94cd82fb99ad8 (diff)
parent7ef3d7d58d9dc73ee3d4f8f56d0024c8cca8163f (diff)
Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar: "There were so many changes in the x86/asm, x86/apic and x86/mm topics in this cycle that the topical separation of -tip broke down somewhat - so the result is a more traditional architecture pull request, collected into the 'x86/core' topic. The topics were still maintained separately as far as possible, so bisectability and conceptual separation should still be pretty good - but there were a handful of merge points to avoid excessive dependencies (and conflicts) that would have been poorly tested in the end. The next cycle will hopefully be much more quiet (or at least will have fewer dependencies). The main changes in this cycle were: * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas Gleixner) - This is the second and most intrusive part of changes to the x86 interrupt handling - full conversion to hierarchical interrupt domains: [IOAPIC domain] ----- | [MSI domain] --------[Remapping domain] ----- [ Vector domain ] | (optional) | [HPET MSI domain] ----- | | [DMAR domain] ----------------------------- | [Legacy domain] ----------------------------- This now reflects the actual hardware and allowed us to distangle the domain specific code from the underlying parent domain, which can be optional in the case of interrupt remapping. It's a clear separation of functionality and removes quite some duct tape constructs which plugged the remap code between ioapic/msi/hpet and the vector management. - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt injection into guests (Feng Wu) * x86/asm changes: - Tons of cleanups and small speedups, micro-optimizations. This is in preparation to move a good chunk of the low level entry code from assembly to C code (Denys Vlasenko, Andy Lutomirski, Brian Gerst) - Moved all system entry related code to a new home under arch/x86/entry/ (Ingo Molnar) - Removal of the fragile and ugly CFI dwarf debuginfo annotations. Conversion to C will reintroduce many of them - but meanwhile they are only getting in the way, and the upstream kernel does not rely on them (Ingo Molnar) - NOP handling refinements. (Borislav Petkov) * x86/mm changes: - Big PAT and MTRR rework: making the code more robust and preparing to phase out exposing direct MTRR interfaces to drivers - in favor of using PAT driven interfaces (Toshi Kani, Luis R Rodriguez, Borislav Petkov) - New ioremap_wt()/set_memory_wt() interfaces to support Write-Through cached memory mappings. This is especially important for good performance on NVDIMM hardware (Toshi Kani) * x86/ras changes: - Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. - Add support for Intel "Local MCE"s: upcoming CPUs will support CPU-local MCE interrupts, as opposed to the traditional system- wide broadcasted MCE interrupts (Ashok Raj) - Misc cleanups (Borislav Petkov) * x86/platform changes: - Intel Atom SoC updates ... and lots of other cleanups, fixlets and other changes - see the shortlog and the Git log for details" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits) x86/hpet: Use proper hpet device number for MSI allocation x86/hpet: Check for irq==0 when allocating hpet MSI interrupts x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail genirq: Prevent crash in irq_move_irq() genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain iommu, x86: Properly handle posted interrupts for IOMMU hotplug iommu, x86: Provide irq_remapping_cap() interface iommu, x86: Setup Posted-Interrupts capability for Intel iommu iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Avoid migrating VT-d posted interrupts iommu, x86: Save the mode (posted or remapped) of an IRTE iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip iommu: dmar: Provide helper to copy shared irte fields iommu: dmar: Extend struct irte for VT-d Posted-Interrupts iommu: Add new member capability to struct irq_remap_ops x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry() ...
Diffstat (limited to 'arch')
-rw-r--r--arch/arc/include/asm/io.h1
-rw-r--r--arch/arm/include/asm/io.h1
-rw-r--r--arch/arm64/include/asm/io.h1
-rw-r--r--arch/avr32/include/asm/io.h1
-rw-r--r--arch/frv/include/asm/io.h4
-rw-r--r--arch/ia64/include/asm/irq_remapping.h2
-rw-r--r--arch/ia64/kernel/msi_ia64.c30
-rw-r--r--arch/m32r/include/asm/io.h1
-rw-r--r--arch/m68k/include/asm/io_mm.h4
-rw-r--r--arch/m68k/include/asm/io_no.h4
-rw-r--r--arch/metag/include/asm/io.h3
-rw-r--r--arch/microblaze/include/asm/io.h2
-rw-r--r--arch/mn10300/include/asm/io.h1
-rw-r--r--arch/nios2/include/asm/io.h1
-rw-r--r--arch/s390/include/asm/io.h1
-rw-r--r--arch/sparc/include/asm/io_32.h1
-rw-r--r--arch/sparc/include/asm/io_64.h1
-rw-r--r--arch/tile/include/asm/io.h2
-rw-r--r--arch/x86/Kbuild5
-rw-r--r--arch/x86/Kconfig231
-rw-r--r--arch/x86/Kconfig.debug11
-rw-r--r--arch/x86/Makefile23
-rw-r--r--arch/x86/entry/Makefile10
-rw-r--r--arch/x86/entry/calling.h (renamed from arch/x86/include/asm/calling.h)98
-rw-r--r--arch/x86/entry/entry_32.S1248
-rw-r--r--arch/x86/entry/entry_64.S (renamed from arch/x86/kernel/entry_64.S)1125
-rw-r--r--arch/x86/entry/entry_64_compat.S556
-rw-r--r--arch/x86/entry/syscall_32.c (renamed from arch/x86/kernel/syscall_32.c)6
-rw-r--r--arch/x86/entry/syscall_64.c (renamed from arch/x86/kernel/syscall_64.c)0
-rw-r--r--arch/x86/entry/syscalls/Makefile (renamed from arch/x86/syscalls/Makefile)4
-rw-r--r--arch/x86/entry/syscalls/syscall_32.tbl (renamed from arch/x86/syscalls/syscall_32.tbl)0
-rw-r--r--arch/x86/entry/syscalls/syscall_64.tbl (renamed from arch/x86/syscalls/syscall_64.tbl)0
-rw-r--r--arch/x86/entry/syscalls/syscallhdr.sh (renamed from arch/x86/syscalls/syscallhdr.sh)0
-rw-r--r--arch/x86/entry/syscalls/syscalltbl.sh (renamed from arch/x86/syscalls/syscalltbl.sh)0
-rw-r--r--arch/x86/entry/thunk_32.S (renamed from arch/x86/lib/thunk_32.S)15
-rw-r--r--arch/x86/entry/thunk_64.S (renamed from arch/x86/lib/thunk_64.S)46
-rw-r--r--arch/x86/entry/vdso/.gitignore (renamed from arch/x86/vdso/.gitignore)0
-rw-r--r--arch/x86/entry/vdso/Makefile (renamed from arch/x86/vdso/Makefile)0
-rwxr-xr-xarch/x86/entry/vdso/checkundef.sh (renamed from arch/x86/vdso/checkundef.sh)0
-rw-r--r--arch/x86/entry/vdso/vclock_gettime.c (renamed from arch/x86/vdso/vclock_gettime.c)0
-rw-r--r--arch/x86/entry/vdso/vdso-layout.lds.S (renamed from arch/x86/vdso/vdso-layout.lds.S)0
-rw-r--r--arch/x86/entry/vdso/vdso-note.S (renamed from arch/x86/vdso/vdso-note.S)0
-rw-r--r--arch/x86/entry/vdso/vdso.lds.S (renamed from arch/x86/vdso/vdso.lds.S)0
-rw-r--r--arch/x86/entry/vdso/vdso2c.c (renamed from arch/x86/vdso/vdso2c.c)0
-rw-r--r--arch/x86/entry/vdso/vdso2c.h (renamed from arch/x86/vdso/vdso2c.h)0
-rw-r--r--arch/x86/entry/vdso/vdso32-setup.c (renamed from arch/x86/vdso/vdso32-setup.c)0
-rw-r--r--arch/x86/entry/vdso/vdso32/.gitignore (renamed from arch/x86/vdso/vdso32/.gitignore)0
-rw-r--r--arch/x86/entry/vdso/vdso32/int80.S (renamed from arch/x86/vdso/vdso32/int80.S)0
-rw-r--r--arch/x86/entry/vdso/vdso32/note.S (renamed from arch/x86/vdso/vdso32/note.S)0
-rw-r--r--arch/x86/entry/vdso/vdso32/sigreturn.S (renamed from arch/x86/vdso/vdso32/sigreturn.S)0
-rw-r--r--arch/x86/entry/vdso/vdso32/syscall.S (renamed from arch/x86/vdso/vdso32/syscall.S)0
-rw-r--r--arch/x86/entry/vdso/vdso32/sysenter.S (renamed from arch/x86/vdso/vdso32/sysenter.S)0
-rw-r--r--arch/x86/entry/vdso/vdso32/vclock_gettime.c (renamed from arch/x86/vdso/vdso32/vclock_gettime.c)0
-rw-r--r--arch/x86/entry/vdso/vdso32/vdso-fakesections.c (renamed from arch/x86/vdso/vdso32/vdso-fakesections.c)0
-rw-r--r--arch/x86/entry/vdso/vdso32/vdso32.lds.S (renamed from arch/x86/vdso/vdso32/vdso32.lds.S)0
-rw-r--r--arch/x86/entry/vdso/vdsox32.lds.S (renamed from arch/x86/vdso/vdsox32.lds.S)0
-rw-r--r--arch/x86/entry/vdso/vgetcpu.c (renamed from arch/x86/vdso/vgetcpu.c)0
-rw-r--r--arch/x86/entry/vdso/vma.c (renamed from arch/x86/vdso/vma.c)0
-rw-r--r--arch/x86/entry/vsyscall/Makefile7
-rw-r--r--arch/x86/entry/vsyscall/vsyscall_64.c (renamed from arch/x86/kernel/vsyscall_64.c)0
-rw-r--r--arch/x86/entry/vsyscall/vsyscall_emu_64.S (renamed from arch/x86/kernel/vsyscall_emu_64.S)0
-rw-r--r--arch/x86/entry/vsyscall/vsyscall_gtod.c (renamed from arch/x86/kernel/vsyscall_gtod.c)0
-rw-r--r--arch/x86/entry/vsyscall/vsyscall_trace.h (renamed from arch/x86/kernel/vsyscall_trace.h)2
-rw-r--r--arch/x86/ia32/Makefile2
-rw-r--r--arch/x86/ia32/ia32entry.S611
-rw-r--r--arch/x86/include/asm/alternative-asm.h18
-rw-r--r--arch/x86/include/asm/apic.h6
-rw-r--r--arch/x86/include/asm/asm.h25
-rw-r--r--arch/x86/include/asm/atomic.h30
-rw-r--r--arch/x86/include/asm/atomic64_64.h8
-rw-r--r--arch/x86/include/asm/cacheflush.h6
-rw-r--r--arch/x86/include/asm/dwarf2.h170
-rw-r--r--arch/x86/include/asm/entry_arch.h5
-rw-r--r--arch/x86/include/asm/frame.h7
-rw-r--r--arch/x86/include/asm/hardirq.h4
-rw-r--r--arch/x86/include/asm/hpet.h16
-rw-r--r--arch/x86/include/asm/hw_irq.h140
-rw-r--r--arch/x86/include/asm/io.h9
-rw-r--r--arch/x86/include/asm/io_apic.h114
-rw-r--r--arch/x86/include/asm/irq.h4
-rw-r--r--arch/x86/include/asm/irq_remapping.h80
-rw-r--r--arch/x86/include/asm/irq_vectors.h51
-rw-r--r--arch/x86/include/asm/irqdomain.h63
-rw-r--r--arch/x86/include/asm/mce.h28
-rw-r--r--arch/x86/include/asm/msi.h7
-rw-r--r--arch/x86/include/asm/msr-index.h (renamed from arch/x86/include/uapi/asm/msr-index.h)2
-rw-r--r--arch/x86/include/asm/msr.h12
-rw-r--r--arch/x86/include/asm/mtrr.h15
-rw-r--r--arch/x86/include/asm/paravirt_types.h7
-rw-r--r--arch/x86/include/asm/pat.h9
-rw-r--r--arch/x86/include/asm/pci.h5
-rw-r--r--arch/x86/include/asm/pgtable.h8
-rw-r--r--arch/x86/include/asm/pgtable_types.h3
-rw-r--r--arch/x86/include/asm/proto.h10
-rw-r--r--arch/x86/include/asm/special_insns.h38
-rw-r--r--arch/x86/include/asm/thread_info.h8
-rw-r--r--arch/x86/include/asm/topology.h2
-rw-r--r--arch/x86/include/asm/trace/irq_vectors.h6
-rw-r--r--arch/x86/include/asm/traps.h3
-rw-r--r--arch/x86/include/asm/uaccess_32.h4
-rw-r--r--arch/x86/include/asm/x86_init.h21
-rw-r--r--arch/x86/include/uapi/asm/msr.h2
-rw-r--r--arch/x86/include/uapi/asm/mtrr.h8
-rw-r--r--arch/x86/kernel/Makefile5
-rw-r--r--arch/x86/kernel/acpi/boot.c73
-rw-r--r--arch/x86/kernel/acpi/wakeup_64.S6
-rw-r--r--arch/x86/kernel/alternative.c9
-rw-r--r--arch/x86/kernel/apb_timer.c4
-rw-r--r--arch/x86/kernel/apic/htirq.c173
-rw-r--r--arch/x86/kernel/apic/io_apic.c1303
-rw-r--r--arch/x86/kernel/apic/msi.c417
-rw-r--r--arch/x86/kernel/apic/vector.c448
-rw-r--r--arch/x86/kernel/apic/x2apic_phys.c2
-rw-r--r--arch/x86/kernel/asm-offsets.c2
-rw-r--r--arch/x86/kernel/asm-offsets_64.c2
-rw-r--r--arch/x86/kernel/cpu/amd.c6
-rw-r--r--arch/x86/kernel/cpu/common.c16
-rw-r--r--arch/x86/kernel/cpu/intel_cacheinfo.c8
-rw-r--r--arch/x86/kernel/cpu/mcheck/mce.c50
-rw-r--r--arch/x86/kernel/cpu/mcheck/mce_amd.c141
-rw-r--r--arch/x86/kernel/cpu/mcheck/mce_intel.c44
-rw-r--r--arch/x86/kernel/cpu/mshyperv.c6
-rw-r--r--arch/x86/kernel/cpu/mtrr/cleanup.c3
-rw-r--r--arch/x86/kernel/cpu/mtrr/generic.c209
-rw-r--r--arch/x86/kernel/cpu/mtrr/main.c48
-rw-r--r--arch/x86/kernel/cpu/mtrr/mtrr.h2
-rw-r--r--arch/x86/kernel/crash.c1
-rw-r--r--arch/x86/kernel/devicetree.c41
-rw-r--r--arch/x86/kernel/early-quirks.c8
-rw-r--r--arch/x86/kernel/entry_32.S1401
-rw-r--r--arch/x86/kernel/head_32.S4
-rw-r--r--arch/x86/kernel/head_64.S4
-rw-r--r--arch/x86/kernel/hpet.c50
-rw-r--r--arch/x86/kernel/i8259.c8
-rw-r--r--arch/x86/kernel/irq.c62
-rw-r--r--arch/x86/kernel/irq_32.c6
-rw-r--r--arch/x86/kernel/irq_64.c6
-rw-r--r--arch/x86/kernel/irq_work.c10
-rw-r--r--arch/x86/kernel/irqinit.c10
-rw-r--r--arch/x86/kernel/machine_kexec_64.c1
-rw-r--r--arch/x86/kernel/mpparse.c7
-rw-r--r--arch/x86/kernel/paravirt.c4
-rw-r--r--arch/x86/kernel/paravirt_patch_64.c1
-rw-r--r--arch/x86/kernel/process_32.c5
-rw-r--r--arch/x86/kernel/process_64.c3
-rw-r--r--arch/x86/kernel/setup.c3
-rw-r--r--arch/x86/kernel/smp.c19
-rw-r--r--arch/x86/kernel/smpboot.c43
-rw-r--r--arch/x86/kernel/traps.c21
-rw-r--r--arch/x86/kernel/x86_init.c9
-rw-r--r--arch/x86/lguest/boot.c4
-rw-r--r--arch/x86/lib/Makefile3
-rw-r--r--arch/x86/lib/atomic64_386_32.S7
-rw-r--r--arch/x86/lib/atomic64_cx8_32.S61
-rw-r--r--arch/x86/lib/checksum_32.S52
-rw-r--r--arch/x86/lib/clear_page_64.S7
-rw-r--r--arch/x86/lib/cmpxchg16b_emu.S12
-rw-r--r--arch/x86/lib/cmpxchg8b_emu.S11
-rw-r--r--arch/x86/lib/copy_page_64.S11
-rw-r--r--arch/x86/lib/copy_user_64.S127
-rw-r--r--arch/x86/lib/copy_user_nocache_64.S136
-rw-r--r--arch/x86/lib/csum-copy_64.S17
-rw-r--r--arch/x86/lib/getuser.S13
-rw-r--r--arch/x86/lib/iomap_copy_64.S3
-rw-r--r--arch/x86/lib/memcpy_64.S3
-rw-r--r--arch/x86/lib/memmove_64.S3
-rw-r--r--arch/x86/lib/memset_64.S5
-rw-r--r--arch/x86/lib/msr-reg.S44
-rw-r--r--arch/x86/lib/putuser.S8
-rw-r--r--arch/x86/lib/rwsem.S49
-rw-r--r--arch/x86/mm/init.c6
-rw-r--r--arch/x86/mm/iomap_32.c12
-rw-r--r--arch/x86/mm/ioremap.c71
-rw-r--r--arch/x86/mm/pageattr-test.c1
-rw-r--r--arch/x86/mm/pageattr.c84
-rw-r--r--arch/x86/mm/pat.c337
-rw-r--r--arch/x86/mm/pat_internal.h2
-rw-r--r--arch/x86/mm/pat_rbtree.c6
-rw-r--r--arch/x86/mm/pgtable.c60
-rw-r--r--arch/x86/net/bpf_jit.S1
-rw-r--r--arch/x86/pci/i386.c6
-rw-r--r--arch/x86/pci/intel_mid_pci.c6
-rw-r--r--arch/x86/pci/irq.c13
-rw-r--r--arch/x86/platform/Makefile1
-rw-r--r--arch/x86/platform/atom/Makefile1
-rw-r--r--arch/x86/platform/atom/punit_atom_debug.c183
-rw-r--r--arch/x86/platform/intel-mid/device_libs/platform_wdt.c5
-rw-r--r--arch/x86/platform/intel-mid/intel-mid.c18
-rw-r--r--arch/x86/platform/intel-mid/sfi.c30
-rw-r--r--arch/x86/platform/sfi/sfi.c7
-rw-r--r--arch/x86/platform/uv/uv_irq.c298
-rw-r--r--arch/x86/power/hibernate_asm_64.S8
-rw-r--r--arch/x86/um/Makefile2
-rw-r--r--arch/x86/xen/enlighten.c8
-rw-r--r--arch/x86/xen/p2m.c1
-rw-r--r--arch/x86/xen/xen-asm_64.S28
-rw-r--r--arch/x86/xen/xen-ops.h2
-rw-r--r--arch/xtensa/include/asm/io.h1
198 files changed, 5783 insertions, 5710 deletions
diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
index cabd518cb253..7cc4ced5dbf4 100644
--- a/arch/arc/include/asm/io.h
+++ b/arch/arc/include/asm/io.h
@@ -20,6 +20,7 @@ extern void iounmap(const void __iomem *addr);
20 20
21#define ioremap_nocache(phy, sz) ioremap(phy, sz) 21#define ioremap_nocache(phy, sz) ioremap(phy, sz)
22#define ioremap_wc(phy, sz) ioremap(phy, sz) 22#define ioremap_wc(phy, sz) ioremap(phy, sz)
23#define ioremap_wt(phy, sz) ioremap(phy, sz)
23 24
24/* Change struct page to physical address */ 25/* Change struct page to physical address */
25#define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT) 26#define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index db58deb00aa7..1b7677d1e5e1 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -336,6 +336,7 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
336#define ioremap_nocache(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE) 336#define ioremap_nocache(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE)
337#define ioremap_cache(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE_CACHED) 337#define ioremap_cache(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE_CACHED)
338#define ioremap_wc(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE_WC) 338#define ioremap_wc(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE_WC)
339#define ioremap_wt(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE)
339#define iounmap __arm_iounmap 340#define iounmap __arm_iounmap
340 341
341/* 342/*
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 540f7c0aea82..7116d3973058 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -170,6 +170,7 @@ extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size);
170#define ioremap(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) 170#define ioremap(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
171#define ioremap_nocache(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) 171#define ioremap_nocache(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
172#define ioremap_wc(addr, size) __ioremap((addr), (size), __pgprot(PROT_NORMAL_NC)) 172#define ioremap_wc(addr, size) __ioremap((addr), (size), __pgprot(PROT_NORMAL_NC))
173#define ioremap_wt(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
173#define iounmap __iounmap 174#define iounmap __iounmap
174 175
175/* 176/*
diff --git a/arch/avr32/include/asm/io.h b/arch/avr32/include/asm/io.h
index 4f5ec2bb7172..e998ff5d8e1a 100644
--- a/arch/avr32/include/asm/io.h
+++ b/arch/avr32/include/asm/io.h
@@ -296,6 +296,7 @@ extern void __iounmap(void __iomem *addr);
296 __iounmap(addr) 296 __iounmap(addr)
297 297
298#define ioremap_wc ioremap_nocache 298#define ioremap_wc ioremap_nocache
299#define ioremap_wt ioremap_nocache
299 300
300#define cached(addr) P1SEGADDR(addr) 301#define cached(addr) P1SEGADDR(addr)
301#define uncached(addr) P2SEGADDR(addr) 302#define uncached(addr) P2SEGADDR(addr)
diff --git a/arch/frv/include/asm/io.h b/arch/frv/include/asm/io.h
index 0b78bc89e840..a31b63ec4930 100644
--- a/arch/frv/include/asm/io.h
+++ b/arch/frv/include/asm/io.h
@@ -17,6 +17,8 @@
17 17
18#ifdef __KERNEL__ 18#ifdef __KERNEL__
19 19
20#define ARCH_HAS_IOREMAP_WT
21
20#include <linux/types.h> 22#include <linux/types.h>
21#include <asm/virtconvert.h> 23#include <asm/virtconvert.h>
22#include <asm/string.h> 24#include <asm/string.h>
@@ -265,7 +267,7 @@ static inline void __iomem *ioremap_nocache(unsigned long physaddr, unsigned lon
265 return __ioremap(physaddr, size, IOMAP_NOCACHE_SER); 267 return __ioremap(physaddr, size, IOMAP_NOCACHE_SER);
266} 268}
267 269
268static inline void __iomem *ioremap_writethrough(unsigned long physaddr, unsigned long size) 270static inline void __iomem *ioremap_wt(unsigned long physaddr, unsigned long size)
269{ 271{
270 return __ioremap(physaddr, size, IOMAP_WRITETHROUGH); 272 return __ioremap(physaddr, size, IOMAP_WRITETHROUGH);
271} 273}
diff --git a/arch/ia64/include/asm/irq_remapping.h b/arch/ia64/include/asm/irq_remapping.h
index e3b3556e2e1b..a8687b1d8906 100644
--- a/arch/ia64/include/asm/irq_remapping.h
+++ b/arch/ia64/include/asm/irq_remapping.h
@@ -1,6 +1,4 @@
1#ifndef __IA64_INTR_REMAPPING_H 1#ifndef __IA64_INTR_REMAPPING_H
2#define __IA64_INTR_REMAPPING_H 2#define __IA64_INTR_REMAPPING_H
3#define irq_remapping_enabled 0 3#define irq_remapping_enabled 0
4#define dmar_alloc_hwirq create_irq
5#define dmar_free_hwirq destroy_irq
6#endif 4#endif
diff --git a/arch/ia64/kernel/msi_ia64.c b/arch/ia64/kernel/msi_ia64.c
index 9dd7464f8c17..d70bf15c690a 100644
--- a/arch/ia64/kernel/msi_ia64.c
+++ b/arch/ia64/kernel/msi_ia64.c
@@ -165,7 +165,7 @@ static struct irq_chip dmar_msi_type = {
165 .irq_retrigger = ia64_msi_retrigger_irq, 165 .irq_retrigger = ia64_msi_retrigger_irq,
166}; 166};
167 167
168static int 168static void
169msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg) 169msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg)
170{ 170{
171 struct irq_cfg *cfg = irq_cfg + irq; 171 struct irq_cfg *cfg = irq_cfg + irq;
@@ -186,21 +186,29 @@ msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg)
186 MSI_DATA_LEVEL_ASSERT | 186 MSI_DATA_LEVEL_ASSERT |
187 MSI_DATA_DELIVERY_FIXED | 187 MSI_DATA_DELIVERY_FIXED |
188 MSI_DATA_VECTOR(cfg->vector); 188 MSI_DATA_VECTOR(cfg->vector);
189 return 0;
190} 189}
191 190
192int arch_setup_dmar_msi(unsigned int irq) 191int dmar_alloc_hwirq(int id, int node, void *arg)
193{ 192{
194 int ret; 193 int irq;
195 struct msi_msg msg; 194 struct msi_msg msg;
196 195
197 ret = msi_compose_msg(NULL, irq, &msg); 196 irq = create_irq();
198 if (ret < 0) 197 if (irq > 0) {
199 return ret; 198 irq_set_handler_data(irq, arg);
200 dmar_msi_write(irq, &msg); 199 irq_set_chip_and_handler_name(irq, &dmar_msi_type,
201 irq_set_chip_and_handler_name(irq, &dmar_msi_type, handle_edge_irq, 200 handle_edge_irq, "edge");
202 "edge"); 201 msi_compose_msg(NULL, irq, &msg);
203 return 0; 202 dmar_msi_write(irq, &msg);
203 }
204
205 return irq;
206}
207
208void dmar_free_hwirq(int irq)
209{
210 irq_set_handler_data(irq, NULL);
211 destroy_irq(irq);
204} 212}
205#endif /* CONFIG_INTEL_IOMMU */ 213#endif /* CONFIG_INTEL_IOMMU */
206 214
diff --git a/arch/m32r/include/asm/io.h b/arch/m32r/include/asm/io.h
index 9cc00dbd59ce..0c3f25ee3381 100644
--- a/arch/m32r/include/asm/io.h
+++ b/arch/m32r/include/asm/io.h
@@ -68,6 +68,7 @@ static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
68extern void iounmap(volatile void __iomem *addr); 68extern void iounmap(volatile void __iomem *addr);
69#define ioremap_nocache(off,size) ioremap(off,size) 69#define ioremap_nocache(off,size) ioremap(off,size)
70#define ioremap_wc ioremap_nocache 70#define ioremap_wc ioremap_nocache
71#define ioremap_wt ioremap_nocache
71 72
72/* 73/*
73 * IO bus memory addresses are also 1:1 with the physical address 74 * IO bus memory addresses are also 1:1 with the physical address
diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index 8955b40a5dc4..618c85d3c786 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -20,6 +20,8 @@
20 20
21#ifdef __KERNEL__ 21#ifdef __KERNEL__
22 22
23#define ARCH_HAS_IOREMAP_WT
24
23#include <linux/compiler.h> 25#include <linux/compiler.h>
24#include <asm/raw_io.h> 26#include <asm/raw_io.h>
25#include <asm/virtconvert.h> 27#include <asm/virtconvert.h>
@@ -465,7 +467,7 @@ static inline void __iomem *ioremap_nocache(unsigned long physaddr, unsigned lon
465{ 467{
466 return __ioremap(physaddr, size, IOMAP_NOCACHE_SER); 468 return __ioremap(physaddr, size, IOMAP_NOCACHE_SER);
467} 469}
468static inline void __iomem *ioremap_writethrough(unsigned long physaddr, 470static inline void __iomem *ioremap_wt(unsigned long physaddr,
469 unsigned long size) 471 unsigned long size)
470{ 472{
471 return __ioremap(physaddr, size, IOMAP_WRITETHROUGH); 473 return __ioremap(physaddr, size, IOMAP_WRITETHROUGH);
diff --git a/arch/m68k/include/asm/io_no.h b/arch/m68k/include/asm/io_no.h
index a93c8cde4d38..ad7bd40e6742 100644
--- a/arch/m68k/include/asm/io_no.h
+++ b/arch/m68k/include/asm/io_no.h
@@ -3,6 +3,8 @@
3 3
4#ifdef __KERNEL__ 4#ifdef __KERNEL__
5 5
6#define ARCH_HAS_IOREMAP_WT
7
6#include <asm/virtconvert.h> 8#include <asm/virtconvert.h>
7#include <asm-generic/iomap.h> 9#include <asm-generic/iomap.h>
8 10
@@ -153,7 +155,7 @@ static inline void *ioremap_nocache(unsigned long physaddr, unsigned long size)
153{ 155{
154 return __ioremap(physaddr, size, IOMAP_NOCACHE_SER); 156 return __ioremap(physaddr, size, IOMAP_NOCACHE_SER);
155} 157}
156static inline void *ioremap_writethrough(unsigned long physaddr, unsigned long size) 158static inline void *ioremap_wt(unsigned long physaddr, unsigned long size)
157{ 159{
158 return __ioremap(physaddr, size, IOMAP_WRITETHROUGH); 160 return __ioremap(physaddr, size, IOMAP_WRITETHROUGH);
159} 161}
diff --git a/arch/metag/include/asm/io.h b/arch/metag/include/asm/io.h
index d5779b0ec573..9890f21eadbe 100644
--- a/arch/metag/include/asm/io.h
+++ b/arch/metag/include/asm/io.h
@@ -160,6 +160,9 @@ extern void __iounmap(void __iomem *addr);
160#define ioremap_wc(offset, size) \ 160#define ioremap_wc(offset, size) \
161 __ioremap((offset), (size), _PAGE_WR_COMBINE) 161 __ioremap((offset), (size), _PAGE_WR_COMBINE)
162 162
163#define ioremap_wt(offset, size) \
164 __ioremap((offset), (size), 0)
165
163#define iounmap(addr) \ 166#define iounmap(addr) \
164 __iounmap(addr) 167 __iounmap(addr)
165 168
diff --git a/arch/microblaze/include/asm/io.h b/arch/microblaze/include/asm/io.h
index 940f5fc1d1da..39b6315db82e 100644
--- a/arch/microblaze/include/asm/io.h
+++ b/arch/microblaze/include/asm/io.h
@@ -39,10 +39,10 @@ extern resource_size_t isa_mem_base;
39extern void iounmap(void __iomem *addr); 39extern void iounmap(void __iomem *addr);
40 40
41extern void __iomem *ioremap(phys_addr_t address, unsigned long size); 41extern void __iomem *ioremap(phys_addr_t address, unsigned long size);
42#define ioremap_writethrough(addr, size) ioremap((addr), (size))
43#define ioremap_nocache(addr, size) ioremap((addr), (size)) 42#define ioremap_nocache(addr, size) ioremap((addr), (size))
44#define ioremap_fullcache(addr, size) ioremap((addr), (size)) 43#define ioremap_fullcache(addr, size) ioremap((addr), (size))
45#define ioremap_wc(addr, size) ioremap((addr), (size)) 44#define ioremap_wc(addr, size) ioremap((addr), (size))
45#define ioremap_wt(addr, size) ioremap((addr), (size))
46 46
47#endif /* CONFIG_MMU */ 47#endif /* CONFIG_MMU */
48 48
diff --git a/arch/mn10300/include/asm/io.h b/arch/mn10300/include/asm/io.h
index cc4a2ba9e228..07c5b4a3903b 100644
--- a/arch/mn10300/include/asm/io.h
+++ b/arch/mn10300/include/asm/io.h
@@ -282,6 +282,7 @@ static inline void __iomem *ioremap_nocache(unsigned long offset, unsigned long
282} 282}
283 283
284#define ioremap_wc ioremap_nocache 284#define ioremap_wc ioremap_nocache
285#define ioremap_wt ioremap_nocache
285 286
286static inline void iounmap(void __iomem *addr) 287static inline void iounmap(void __iomem *addr)
287{ 288{
diff --git a/arch/nios2/include/asm/io.h b/arch/nios2/include/asm/io.h
index 6e24d7cceb0c..c5a62da22cd2 100644
--- a/arch/nios2/include/asm/io.h
+++ b/arch/nios2/include/asm/io.h
@@ -46,6 +46,7 @@ static inline void iounmap(void __iomem *addr)
46} 46}
47 47
48#define ioremap_wc ioremap_nocache 48#define ioremap_wc ioremap_nocache
49#define ioremap_wt ioremap_nocache
49 50
50/* Pages to physical address... */ 51/* Pages to physical address... */
51#define page_to_phys(page) virt_to_phys(page_to_virt(page)) 52#define page_to_phys(page) virt_to_phys(page_to_virt(page))
diff --git a/arch/s390/include/asm/io.h b/arch/s390/include/asm/io.h
index 30fd5c84680e..cb5fdf3a78fc 100644
--- a/arch/s390/include/asm/io.h
+++ b/arch/s390/include/asm/io.h
@@ -29,6 +29,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
29 29
30#define ioremap_nocache(addr, size) ioremap(addr, size) 30#define ioremap_nocache(addr, size) ioremap(addr, size)
31#define ioremap_wc ioremap_nocache 31#define ioremap_wc ioremap_nocache
32#define ioremap_wt ioremap_nocache
32 33
33static inline void __iomem *ioremap(unsigned long offset, unsigned long size) 34static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
34{ 35{
diff --git a/arch/sparc/include/asm/io_32.h b/arch/sparc/include/asm/io_32.h
index 407ac14295f4..57f26c398dc9 100644
--- a/arch/sparc/include/asm/io_32.h
+++ b/arch/sparc/include/asm/io_32.h
@@ -129,6 +129,7 @@ static inline void sbus_memcpy_toio(volatile void __iomem *dst,
129void __iomem *ioremap(unsigned long offset, unsigned long size); 129void __iomem *ioremap(unsigned long offset, unsigned long size);
130#define ioremap_nocache(X,Y) ioremap((X),(Y)) 130#define ioremap_nocache(X,Y) ioremap((X),(Y))
131#define ioremap_wc(X,Y) ioremap((X),(Y)) 131#define ioremap_wc(X,Y) ioremap((X),(Y))
132#define ioremap_wt(X,Y) ioremap((X),(Y))
132void iounmap(volatile void __iomem *addr); 133void iounmap(volatile void __iomem *addr);
133 134
134/* Create a virtual mapping cookie for an IO port range */ 135/* Create a virtual mapping cookie for an IO port range */
diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
index 50d4840d9aeb..c32fa3f752c8 100644
--- a/arch/sparc/include/asm/io_64.h
+++ b/arch/sparc/include/asm/io_64.h
@@ -402,6 +402,7 @@ static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
402 402
403#define ioremap_nocache(X,Y) ioremap((X),(Y)) 403#define ioremap_nocache(X,Y) ioremap((X),(Y))
404#define ioremap_wc(X,Y) ioremap((X),(Y)) 404#define ioremap_wc(X,Y) ioremap((X),(Y))
405#define ioremap_wt(X,Y) ioremap((X),(Y))
405 406
406static inline void iounmap(volatile void __iomem *addr) 407static inline void iounmap(volatile void __iomem *addr)
407{ 408{
diff --git a/arch/tile/include/asm/io.h b/arch/tile/include/asm/io.h
index 6ef4ecab1df2..dc61de15c1f9 100644
--- a/arch/tile/include/asm/io.h
+++ b/arch/tile/include/asm/io.h
@@ -54,7 +54,7 @@ extern void iounmap(volatile void __iomem *addr);
54 54
55#define ioremap_nocache(physaddr, size) ioremap(physaddr, size) 55#define ioremap_nocache(physaddr, size) ioremap(physaddr, size)
56#define ioremap_wc(physaddr, size) ioremap(physaddr, size) 56#define ioremap_wc(physaddr, size) ioremap(physaddr, size)
57#define ioremap_writethrough(physaddr, size) ioremap(physaddr, size) 57#define ioremap_wt(physaddr, size) ioremap(physaddr, size)
58#define ioremap_fullcache(physaddr, size) ioremap(physaddr, size) 58#define ioremap_fullcache(physaddr, size) ioremap(physaddr, size)
59 59
60#define mmiowb() 60#define mmiowb()
diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 3942f74c92d7..1538562cc720 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -1,3 +1,6 @@
1
2obj-y += entry/
3
1obj-$(CONFIG_KVM) += kvm/ 4obj-$(CONFIG_KVM) += kvm/
2 5
3# Xen paravirtualization support 6# Xen paravirtualization support
@@ -11,7 +14,7 @@ obj-y += kernel/
11obj-y += mm/ 14obj-y += mm/
12 15
13obj-y += crypto/ 16obj-y += crypto/
14obj-y += vdso/ 17
15obj-$(CONFIG_IA32_EMULATION) += ia32/ 18obj-$(CONFIG_IA32_EMULATION) += ia32/
16 19
17obj-y += platform/ 20obj-y += platform/
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4e986e809861..7e39f9b22705 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -9,141 +9,141 @@ config 64BIT
9config X86_32 9config X86_32
10 def_bool y 10 def_bool y
11 depends on !64BIT 11 depends on !64BIT
12 select CLKSRC_I8253
13 select HAVE_UID16
14 12
15config X86_64 13config X86_64
16 def_bool y 14 def_bool y
17 depends on 64BIT 15 depends on 64BIT
18 select X86_DEV_DMA_OPS
19 select ARCH_USE_CMPXCHG_LOCKREF
20 select HAVE_LIVEPATCH
21 16
22### Arch settings 17### Arch settings
23config X86 18config X86
24 def_bool y 19 def_bool y
25 select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI 20 select ACPI_LEGACY_TABLES_LOOKUP if ACPI
26 select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI 21 select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
22 select ANON_INODES
23 select ARCH_CLOCKSOURCE_DATA
24 select ARCH_DISCARD_MEMBLOCK
25 select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
27 select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS 26 select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
27 select ARCH_HAS_ELF_RANDOMIZE
28 select ARCH_HAS_FAST_MULTIPLIER 28 select ARCH_HAS_FAST_MULTIPLIER
29 select ARCH_HAS_GCOV_PROFILE_ALL 29 select ARCH_HAS_GCOV_PROFILE_ALL
30 select ARCH_HAS_SG_CHAIN
31 select ARCH_HAVE_NMI_SAFE_CMPXCHG
32 select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
30 select ARCH_MIGHT_HAVE_PC_PARPORT 33 select ARCH_MIGHT_HAVE_PC_PARPORT
31 select ARCH_MIGHT_HAVE_PC_SERIO 34 select ARCH_MIGHT_HAVE_PC_SERIO
32 select HAVE_AOUT if X86_32 35 select ARCH_SUPPORTS_ATOMIC_RMW
33 select HAVE_UNSTABLE_SCHED_CLOCK 36 select ARCH_SUPPORTS_INT128 if X86_64
34 select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 37 select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
35 select ARCH_SUPPORTS_INT128 if X86_64 38 select ARCH_USE_BUILTIN_BSWAP
36 select HAVE_IDE 39 select ARCH_USE_CMPXCHG_LOCKREF if X86_64
37 select HAVE_OPROFILE 40 select ARCH_USE_QUEUED_RWLOCKS
38 select HAVE_PCSPKR_PLATFORM 41 select ARCH_USE_QUEUED_SPINLOCKS
39 select HAVE_PERF_EVENTS
40 select HAVE_IOREMAP_PROT
41 select HAVE_KPROBES
42 select HAVE_MEMBLOCK
43 select HAVE_MEMBLOCK_NODE_MAP
44 select ARCH_DISCARD_MEMBLOCK
45 select ARCH_WANT_OPTIONAL_GPIOLIB
46 select ARCH_WANT_FRAME_POINTERS 42 select ARCH_WANT_FRAME_POINTERS
47 select HAVE_DMA_ATTRS 43 select ARCH_WANT_IPC_PARSE_VERSION if X86_32
48 select HAVE_DMA_CONTIGUOUS 44 select ARCH_WANT_OPTIONAL_GPIOLIB
49 select HAVE_KRETPROBES 45 select BUILDTIME_EXTABLE_SORT
46 select CLKEVT_I8253
47 select CLKSRC_I8253 if X86_32
48 select CLOCKSOURCE_VALIDATE_LAST_CYCLE
49 select CLOCKSOURCE_WATCHDOG
50 select CLONE_BACKWARDS if X86_32
51 select COMPAT_OLD_SIGACTION if IA32_EMULATION
52 select DCACHE_WORD_ACCESS
53 select GENERIC_CLOCKEVENTS
54 select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
55 select GENERIC_CLOCKEVENTS_MIN_ADJUST
56 select GENERIC_CMOS_UPDATE
57 select GENERIC_CPU_AUTOPROBE
50 select GENERIC_EARLY_IOREMAP 58 select GENERIC_EARLY_IOREMAP
51 select HAVE_OPTPROBES 59 select GENERIC_FIND_FIRST_BIT
52 select HAVE_KPROBES_ON_FTRACE 60 select GENERIC_IOMAP
53 select HAVE_FTRACE_MCOUNT_RECORD 61 select GENERIC_IRQ_PROBE
54 select HAVE_FENTRY if X86_64 62 select GENERIC_IRQ_SHOW
63 select GENERIC_PENDING_IRQ if SMP
64 select GENERIC_SMP_IDLE_THREAD
65 select GENERIC_STRNCPY_FROM_USER
66 select GENERIC_STRNLEN_USER
67 select GENERIC_TIME_VSYSCALL
68 select HAVE_ACPI_APEI if ACPI
69 select HAVE_ACPI_APEI_NMI if ACPI
70 select HAVE_ALIGNED_STRUCT_PAGE if SLUB
71 select HAVE_AOUT if X86_32
72 select HAVE_ARCH_AUDITSYSCALL
73 select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
74 select HAVE_ARCH_JUMP_LABEL
75 select HAVE_ARCH_KASAN if X86_64 && SPARSEMEM_VMEMMAP
76 select HAVE_ARCH_KGDB
77 select HAVE_ARCH_KMEMCHECK
78 select HAVE_ARCH_SECCOMP_FILTER
79 select HAVE_ARCH_SOFT_DIRTY if X86_64
80 select HAVE_ARCH_TRACEHOOK
81 select HAVE_ARCH_TRANSPARENT_HUGEPAGE
82 select HAVE_BPF_JIT if X86_64
83 select HAVE_CC_STACKPROTECTOR
84 select HAVE_CMPXCHG_DOUBLE
85 select HAVE_CMPXCHG_LOCAL
86 select HAVE_CONTEXT_TRACKING if X86_64
55 select HAVE_C_RECORDMCOUNT 87 select HAVE_C_RECORDMCOUNT
88 select HAVE_DEBUG_KMEMLEAK
89 select HAVE_DEBUG_STACKOVERFLOW
90 select HAVE_DMA_API_DEBUG
91 select HAVE_DMA_ATTRS
92 select HAVE_DMA_CONTIGUOUS
56 select HAVE_DYNAMIC_FTRACE 93 select HAVE_DYNAMIC_FTRACE
57 select HAVE_DYNAMIC_FTRACE_WITH_REGS 94 select HAVE_DYNAMIC_FTRACE_WITH_REGS
58 select HAVE_FUNCTION_TRACER
59 select HAVE_FUNCTION_GRAPH_TRACER
60 select HAVE_FUNCTION_GRAPH_FP_TEST
61 select HAVE_SYSCALL_TRACEPOINTS
62 select SYSCTL_EXCEPTION_TRACE
63 select HAVE_KVM
64 select HAVE_ARCH_KGDB
65 select HAVE_ARCH_TRACEHOOK
66 select HAVE_GENERIC_DMA_COHERENT if X86_32
67 select HAVE_EFFICIENT_UNALIGNED_ACCESS 95 select HAVE_EFFICIENT_UNALIGNED_ACCESS
68 select USER_STACKTRACE_SUPPORT 96 select HAVE_FENTRY if X86_64
69 select HAVE_REGS_AND_STACK_ACCESS_API 97 select HAVE_FTRACE_MCOUNT_RECORD
70 select HAVE_DMA_API_DEBUG 98 select HAVE_FUNCTION_GRAPH_FP_TEST
71 select HAVE_KERNEL_GZIP 99 select HAVE_FUNCTION_GRAPH_TRACER
100 select HAVE_FUNCTION_TRACER
101 select HAVE_GENERIC_DMA_COHERENT if X86_32
102 select HAVE_HW_BREAKPOINT
103 select HAVE_IDE
104 select HAVE_IOREMAP_PROT
105 select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
106 select HAVE_IRQ_TIME_ACCOUNTING
72 select HAVE_KERNEL_BZIP2 107 select HAVE_KERNEL_BZIP2
108 select HAVE_KERNEL_GZIP
109 select HAVE_KERNEL_LZ4
73 select HAVE_KERNEL_LZMA 110 select HAVE_KERNEL_LZMA
74 select HAVE_KERNEL_XZ
75 select HAVE_KERNEL_LZO 111 select HAVE_KERNEL_LZO
76 select HAVE_KERNEL_LZ4 112 select HAVE_KERNEL_XZ
77 select HAVE_HW_BREAKPOINT 113 select HAVE_KPROBES
114 select HAVE_KPROBES_ON_FTRACE
115 select HAVE_KRETPROBES
116 select HAVE_KVM
117 select HAVE_LIVEPATCH if X86_64
118 select HAVE_MEMBLOCK
119 select HAVE_MEMBLOCK_NODE_MAP
78 select HAVE_MIXED_BREAKPOINTS_REGS 120 select HAVE_MIXED_BREAKPOINTS_REGS
79 select PERF_EVENTS 121 select HAVE_OPROFILE
122 select HAVE_OPTPROBES
123 select HAVE_PCSPKR_PLATFORM
124 select HAVE_PERF_EVENTS
80 select HAVE_PERF_EVENTS_NMI 125 select HAVE_PERF_EVENTS_NMI
81 select HAVE_PERF_REGS 126 select HAVE_PERF_REGS
82 select HAVE_PERF_USER_STACK_DUMP 127 select HAVE_PERF_USER_STACK_DUMP
83 select HAVE_DEBUG_KMEMLEAK 128 select HAVE_REGS_AND_STACK_ACCESS_API
84 select ANON_INODES 129 select HAVE_SYSCALL_TRACEPOINTS
85 select HAVE_ALIGNED_STRUCT_PAGE if SLUB 130 select HAVE_UID16 if X86_32
86 select HAVE_CMPXCHG_LOCAL 131 select HAVE_UNSTABLE_SCHED_CLOCK
87 select HAVE_CMPXCHG_DOUBLE
88 select HAVE_ARCH_KMEMCHECK
89 select HAVE_ARCH_KASAN if X86_64 && SPARSEMEM_VMEMMAP
90 select HAVE_USER_RETURN_NOTIFIER 132 select HAVE_USER_RETURN_NOTIFIER
91 select ARCH_HAS_ELF_RANDOMIZE
92 select HAVE_ARCH_JUMP_LABEL
93 select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
94 select SPARSE_IRQ
95 select GENERIC_FIND_FIRST_BIT
96 select GENERIC_IRQ_PROBE
97 select GENERIC_PENDING_IRQ if SMP
98 select GENERIC_IRQ_SHOW
99 select GENERIC_CLOCKEVENTS_MIN_ADJUST
100 select IRQ_FORCED_THREADING 133 select IRQ_FORCED_THREADING
101 select HAVE_BPF_JIT if X86_64 134 select MODULES_USE_ELF_RELA if X86_64
102 select HAVE_ARCH_TRANSPARENT_HUGEPAGE 135 select MODULES_USE_ELF_REL if X86_32
103 select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE) 136 select OLD_SIGACTION if X86_32
104 select ARCH_HAS_SG_CHAIN 137 select OLD_SIGSUSPEND3 if X86_32 || IA32_EMULATION
105 select CLKEVT_I8253 138 select PERF_EVENTS
106 select ARCH_HAVE_NMI_SAFE_CMPXCHG
107 select GENERIC_IOMAP
108 select DCACHE_WORD_ACCESS
109 select GENERIC_SMP_IDLE_THREAD
110 select ARCH_WANT_IPC_PARSE_VERSION if X86_32
111 select HAVE_ARCH_SECCOMP_FILTER
112 select BUILDTIME_EXTABLE_SORT
113 select GENERIC_CMOS_UPDATE
114 select HAVE_ARCH_SOFT_DIRTY if X86_64
115 select CLOCKSOURCE_WATCHDOG
116 select GENERIC_CLOCKEVENTS
117 select ARCH_CLOCKSOURCE_DATA
118 select CLOCKSOURCE_VALIDATE_LAST_CYCLE
119 select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
120 select GENERIC_TIME_VSYSCALL
121 select GENERIC_STRNCPY_FROM_USER
122 select GENERIC_STRNLEN_USER
123 select HAVE_CONTEXT_TRACKING if X86_64
124 select HAVE_IRQ_TIME_ACCOUNTING
125 select VIRT_TO_BUS
126 select MODULES_USE_ELF_REL if X86_32
127 select MODULES_USE_ELF_RELA if X86_64
128 select CLONE_BACKWARDS if X86_32
129 select ARCH_USE_BUILTIN_BSWAP
130 select ARCH_USE_QUEUED_SPINLOCKS
131 select ARCH_USE_QUEUED_RWLOCKS
132 select OLD_SIGSUSPEND3 if X86_32 || IA32_EMULATION
133 select OLD_SIGACTION if X86_32
134 select COMPAT_OLD_SIGACTION if IA32_EMULATION
135 select RTC_LIB 139 select RTC_LIB
136 select HAVE_DEBUG_STACKOVERFLOW 140 select SPARSE_IRQ
137 select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
138 select HAVE_CC_STACKPROTECTOR
139 select GENERIC_CPU_AUTOPROBE
140 select HAVE_ARCH_AUDITSYSCALL
141 select ARCH_SUPPORTS_ATOMIC_RMW
142 select HAVE_ACPI_APEI if ACPI
143 select HAVE_ACPI_APEI_NMI if ACPI
144 select ACPI_LEGACY_TABLES_LOOKUP if ACPI
145 select X86_FEATURE_NAMES if PROC_FS
146 select SRCU 141 select SRCU
142 select SYSCTL_EXCEPTION_TRACE
143 select USER_STACKTRACE_SUPPORT
144 select VIRT_TO_BUS
145 select X86_DEV_DMA_OPS if X86_64
146 select X86_FEATURE_NAMES if PROC_FS
147 147
148config INSTRUCTION_DECODER 148config INSTRUCTION_DECODER
149 def_bool y 149 def_bool y
@@ -261,10 +261,6 @@ config X86_64_SMP
261 def_bool y 261 def_bool y
262 depends on X86_64 && SMP 262 depends on X86_64 && SMP
263 263
264config X86_HT
265 def_bool y
266 depends on SMP
267
268config X86_32_LAZY_GS 264config X86_32_LAZY_GS
269 def_bool y 265 def_bool y
270 depends on X86_32 && !CC_STACKPROTECTOR 266 depends on X86_32 && !CC_STACKPROTECTOR
@@ -342,7 +338,7 @@ config X86_FEATURE_NAMES
342 338
343config X86_X2APIC 339config X86_X2APIC
344 bool "Support x2apic" 340 bool "Support x2apic"
345 depends on X86_LOCAL_APIC && X86_64 && IRQ_REMAP 341 depends on X86_LOCAL_APIC && X86_64 && (IRQ_REMAP || HYPERVISOR_GUEST)
346 ---help--- 342 ---help---
347 This enables x2apic support on CPUs that have this feature. 343 This enables x2apic support on CPUs that have this feature.
348 344
@@ -442,6 +438,7 @@ config X86_UV
442 depends on X86_EXTENDED_PLATFORM 438 depends on X86_EXTENDED_PLATFORM
443 depends on NUMA 439 depends on NUMA
444 depends on X86_X2APIC 440 depends on X86_X2APIC
441 depends on PCI
445 ---help--- 442 ---help---
446 This option is needed in order to support SGI Ultraviolet systems. 443 This option is needed in order to support SGI Ultraviolet systems.
447 If you don't have one of these, you should say N here. 444 If you don't have one of these, you should say N here.
@@ -467,7 +464,6 @@ config X86_INTEL_CE
467 select X86_REBOOTFIXUPS 464 select X86_REBOOTFIXUPS
468 select OF 465 select OF
469 select OF_EARLY_FLATTREE 466 select OF_EARLY_FLATTREE
470 select IRQ_DOMAIN
471 ---help--- 467 ---help---
472 Select for the Intel CE media processor (CE4100) SOC. 468 Select for the Intel CE media processor (CE4100) SOC.
473 This option compiles in support for the CE4100 SOC for settop 469 This option compiles in support for the CE4100 SOC for settop
@@ -852,11 +848,12 @@ config NR_CPUS
852 default "1" if !SMP 848 default "1" if !SMP
853 default "8192" if MAXSMP 849 default "8192" if MAXSMP
854 default "32" if SMP && X86_BIGSMP 850 default "32" if SMP && X86_BIGSMP
855 default "8" if SMP 851 default "8" if SMP && X86_32
852 default "64" if SMP
856 ---help--- 853 ---help---
857 This allows you to specify the maximum number of CPUs which this 854 This allows you to specify the maximum number of CPUs which this
858 kernel will support. If CPUMASK_OFFSTACK is enabled, the maximum 855 kernel will support. If CPUMASK_OFFSTACK is enabled, the maximum
859 supported value is 4096, otherwise the maximum value is 512. The 856 supported value is 8192, otherwise the maximum value is 512. The
860 minimum value which makes sense is 2. 857 minimum value which makes sense is 2.
861 858
862 This is purely to save memory - each supported CPU adds 859 This is purely to save memory - each supported CPU adds
@@ -864,7 +861,7 @@ config NR_CPUS
864 861
865config SCHED_SMT 862config SCHED_SMT
866 bool "SMT (Hyperthreading) scheduler support" 863 bool "SMT (Hyperthreading) scheduler support"
867 depends on X86_HT 864 depends on SMP
868 ---help--- 865 ---help---
869 SMT scheduler support improves the CPU scheduler's decision making 866 SMT scheduler support improves the CPU scheduler's decision making
870 when dealing with Intel Pentium 4 chips with HyperThreading at a 867 when dealing with Intel Pentium 4 chips with HyperThreading at a
@@ -874,7 +871,7 @@ config SCHED_SMT
874config SCHED_MC 871config SCHED_MC
875 def_bool y 872 def_bool y
876 prompt "Multi-core scheduler support" 873 prompt "Multi-core scheduler support"
877 depends on X86_HT 874 depends on SMP
878 ---help--- 875 ---help---
879 Multi-core scheduler support improves the CPU scheduler's decision 876 Multi-core scheduler support improves the CPU scheduler's decision
880 making when dealing with multi-core CPU chips at a cost of slightly 877 making when dealing with multi-core CPU chips at a cost of slightly
@@ -915,12 +912,12 @@ config X86_UP_IOAPIC
915config X86_LOCAL_APIC 912config X86_LOCAL_APIC
916 def_bool y 913 def_bool y
917 depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI 914 depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI
918 select GENERIC_IRQ_LEGACY_ALLOC_HWIRQ 915 select IRQ_DOMAIN_HIERARCHY
916 select PCI_MSI_IRQ_DOMAIN if PCI_MSI
919 917
920config X86_IO_APIC 918config X86_IO_APIC
921 def_bool y 919 def_bool y
922 depends on X86_LOCAL_APIC || X86_UP_IOAPIC 920 depends on X86_LOCAL_APIC || X86_UP_IOAPIC
923 select IRQ_DOMAIN
924 921
925config X86_REROUTE_FOR_BROKEN_BOOT_IRQS 922config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
926 bool "Reroute for broken boot IRQs" 923 bool "Reroute for broken boot IRQs"
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 2fd3ebbb4e33..a15893d17c55 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -344,4 +344,15 @@ config X86_DEBUG_FPU
344 344
345 If unsure, say N. 345 If unsure, say N.
346 346
347config PUNIT_ATOM_DEBUG
348 tristate "ATOM Punit debug driver"
349 select DEBUG_FS
350 select IOSF_MBI
351 ---help---
352 This is a debug driver, which gets the power states
353 of all Punit North Complex devices. The power states of
354 each device is exposed as part of the debugfs interface.
355 The current power state can be read from
356 /sys/kernel/debug/punit_atom/dev_power_state
357
347endmenu 358endmenu
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2fda005bb334..118e6debc483 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -77,6 +77,12 @@ else
77 KBUILD_AFLAGS += -m64 77 KBUILD_AFLAGS += -m64
78 KBUILD_CFLAGS += -m64 78 KBUILD_CFLAGS += -m64
79 79
80 # Align jump targets to 1 byte, not the default 16 bytes:
81 KBUILD_CFLAGS += -falign-jumps=1
82
83 # Pack loops tightly as well:
84 KBUILD_CFLAGS += -falign-loops=1
85
80 # Don't autogenerate traditional x87 instructions 86 # Don't autogenerate traditional x87 instructions
81 KBUILD_CFLAGS += $(call cc-option,-mno-80387) 87 KBUILD_CFLAGS += $(call cc-option,-mno-80387)
82 KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387) 88 KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)
@@ -84,6 +90,9 @@ else
84 # Use -mpreferred-stack-boundary=3 if supported. 90 # Use -mpreferred-stack-boundary=3 if supported.
85 KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3) 91 KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3)
86 92
93 # Use -mskip-rax-setup if supported.
94 KBUILD_CFLAGS += $(call cc-option,-mskip-rax-setup)
95
87 # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu) 96 # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
88 cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8) 97 cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
89 cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona) 98 cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
@@ -140,12 +149,6 @@ endif
140sp-$(CONFIG_X86_32) := esp 149sp-$(CONFIG_X86_32) := esp
141sp-$(CONFIG_X86_64) := rsp 150sp-$(CONFIG_X86_64) := rsp
142 151
143# do binutils support CFI?
144cfi := $(call as-instr,.cfi_startproc\n.cfi_rel_offset $(sp-y)$(comma)0\n.cfi_endproc,-DCONFIG_AS_CFI=1)
145# is .cfi_signal_frame supported too?
146cfi-sigframe := $(call as-instr,.cfi_startproc\n.cfi_signal_frame\n.cfi_endproc,-DCONFIG_AS_CFI_SIGNAL_FRAME=1)
147cfi-sections := $(call as-instr,.cfi_sections .debug_frame,-DCONFIG_AS_CFI_SECTIONS=1)
148
149# does binutils support specific instructions? 152# does binutils support specific instructions?
150asinstr := $(call as-instr,fxsaveq (%rax),-DCONFIG_AS_FXSAVEQ=1) 153asinstr := $(call as-instr,fxsaveq (%rax),-DCONFIG_AS_FXSAVEQ=1)
151asinstr += $(call as-instr,pshufb %xmm0$(comma)%xmm0,-DCONFIG_AS_SSSE3=1) 154asinstr += $(call as-instr,pshufb %xmm0$(comma)%xmm0,-DCONFIG_AS_SSSE3=1)
@@ -153,8 +156,8 @@ asinstr += $(call as-instr,crc32l %eax$(comma)%eax,-DCONFIG_AS_CRC32=1)
153avx_instr := $(call as-instr,vxorps %ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_AVX=1) 156avx_instr := $(call as-instr,vxorps %ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_AVX=1)
154avx2_instr :=$(call as-instr,vpbroadcastb %xmm0$(comma)%ymm1,-DCONFIG_AS_AVX2=1) 157avx2_instr :=$(call as-instr,vpbroadcastb %xmm0$(comma)%ymm1,-DCONFIG_AS_AVX2=1)
155 158
156KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) $(avx2_instr) 159KBUILD_AFLAGS += $(asinstr) $(avx_instr) $(avx2_instr)
157KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) $(avx2_instr) 160KBUILD_CFLAGS += $(asinstr) $(avx_instr) $(avx2_instr)
158 161
159LDFLAGS := -m elf_$(UTS_MACHINE) 162LDFLAGS := -m elf_$(UTS_MACHINE)
160 163
@@ -178,7 +181,7 @@ archscripts: scripts_basic
178# Syscall table generation 181# Syscall table generation
179 182
180archheaders: 183archheaders:
181 $(Q)$(MAKE) $(build)=arch/x86/syscalls all 184 $(Q)$(MAKE) $(build)=arch/x86/entry/syscalls all
182 185
183archprepare: 186archprepare:
184ifeq ($(CONFIG_KEXEC_FILE),y) 187ifeq ($(CONFIG_KEXEC_FILE),y)
@@ -241,7 +244,7 @@ install:
241 244
242PHONY += vdso_install 245PHONY += vdso_install
243vdso_install: 246vdso_install:
244 $(Q)$(MAKE) $(build)=arch/x86/vdso $@ 247 $(Q)$(MAKE) $(build)=arch/x86/entry/vdso $@
245 248
246archclean: 249archclean:
247 $(Q)rm -rf $(objtree)/arch/i386 250 $(Q)rm -rf $(objtree)/arch/i386
diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
new file mode 100644
index 000000000000..7a144971db79
--- /dev/null
+++ b/arch/x86/entry/Makefile
@@ -0,0 +1,10 @@
1#
2# Makefile for the x86 low level entry code
3#
4obj-y := entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
5
6obj-y += vdso/
7obj-y += vsyscall/
8
9obj-$(CONFIG_IA32_EMULATION) += entry_64_compat.o syscall_32.o
10
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/entry/calling.h
index 1c8b50edb2db..f4e6308c4200 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/entry/calling.h
@@ -46,8 +46,6 @@ For 32-bit we have the following conventions - kernel is built with
46 46
47*/ 47*/
48 48
49#include <asm/dwarf2.h>
50
51#ifdef CONFIG_X86_64 49#ifdef CONFIG_X86_64
52 50
53/* 51/*
@@ -91,28 +89,27 @@ For 32-bit we have the following conventions - kernel is built with
91#define SIZEOF_PTREGS 21*8 89#define SIZEOF_PTREGS 21*8
92 90
93 .macro ALLOC_PT_GPREGS_ON_STACK addskip=0 91 .macro ALLOC_PT_GPREGS_ON_STACK addskip=0
94 subq $15*8+\addskip, %rsp 92 addq $-(15*8+\addskip), %rsp
95 CFI_ADJUST_CFA_OFFSET 15*8+\addskip
96 .endm 93 .endm
97 94
98 .macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8910=1 r11=1 95 .macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8910=1 r11=1
99 .if \r11 96 .if \r11
100 movq_cfi r11, 6*8+\offset 97 movq %r11, 6*8+\offset(%rsp)
101 .endif 98 .endif
102 .if \r8910 99 .if \r8910
103 movq_cfi r10, 7*8+\offset 100 movq %r10, 7*8+\offset(%rsp)
104 movq_cfi r9, 8*8+\offset 101 movq %r9, 8*8+\offset(%rsp)
105 movq_cfi r8, 9*8+\offset 102 movq %r8, 9*8+\offset(%rsp)
106 .endif 103 .endif
107 .if \rax 104 .if \rax
108 movq_cfi rax, 10*8+\offset 105 movq %rax, 10*8+\offset(%rsp)
109 .endif 106 .endif
110 .if \rcx 107 .if \rcx
111 movq_cfi rcx, 11*8+\offset 108 movq %rcx, 11*8+\offset(%rsp)
112 .endif 109 .endif
113 movq_cfi rdx, 12*8+\offset 110 movq %rdx, 12*8+\offset(%rsp)
114 movq_cfi rsi, 13*8+\offset 111 movq %rsi, 13*8+\offset(%rsp)
115 movq_cfi rdi, 14*8+\offset 112 movq %rdi, 14*8+\offset(%rsp)
116 .endm 113 .endm
117 .macro SAVE_C_REGS offset=0 114 .macro SAVE_C_REGS offset=0
118 SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1 115 SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1
@@ -131,24 +128,24 @@ For 32-bit we have the following conventions - kernel is built with
131 .endm 128 .endm
132 129
133 .macro SAVE_EXTRA_REGS offset=0 130 .macro SAVE_EXTRA_REGS offset=0
134 movq_cfi r15, 0*8+\offset 131 movq %r15, 0*8+\offset(%rsp)
135 movq_cfi r14, 1*8+\offset 132 movq %r14, 1*8+\offset(%rsp)
136 movq_cfi r13, 2*8+\offset 133 movq %r13, 2*8+\offset(%rsp)
137 movq_cfi r12, 3*8+\offset 134 movq %r12, 3*8+\offset(%rsp)
138 movq_cfi rbp, 4*8+\offset 135 movq %rbp, 4*8+\offset(%rsp)
139 movq_cfi rbx, 5*8+\offset 136 movq %rbx, 5*8+\offset(%rsp)
140 .endm 137 .endm
141 .macro SAVE_EXTRA_REGS_RBP offset=0 138 .macro SAVE_EXTRA_REGS_RBP offset=0
142 movq_cfi rbp, 4*8+\offset 139 movq %rbp, 4*8+\offset(%rsp)
143 .endm 140 .endm
144 141
145 .macro RESTORE_EXTRA_REGS offset=0 142 .macro RESTORE_EXTRA_REGS offset=0
146 movq_cfi_restore 0*8+\offset, r15 143 movq 0*8+\offset(%rsp), %r15
147 movq_cfi_restore 1*8+\offset, r14 144 movq 1*8+\offset(%rsp), %r14
148 movq_cfi_restore 2*8+\offset, r13 145 movq 2*8+\offset(%rsp), %r13
149 movq_cfi_restore 3*8+\offset, r12 146 movq 3*8+\offset(%rsp), %r12
150 movq_cfi_restore 4*8+\offset, rbp 147 movq 4*8+\offset(%rsp), %rbp
151 movq_cfi_restore 5*8+\offset, rbx 148 movq 5*8+\offset(%rsp), %rbx
152 .endm 149 .endm
153 150
154 .macro ZERO_EXTRA_REGS 151 .macro ZERO_EXTRA_REGS
@@ -162,24 +159,24 @@ For 32-bit we have the following conventions - kernel is built with
162 159
163 .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1 160 .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
164 .if \rstor_r11 161 .if \rstor_r11
165 movq_cfi_restore 6*8, r11 162 movq 6*8(%rsp), %r11
166 .endif 163 .endif
167 .if \rstor_r8910 164 .if \rstor_r8910
168 movq_cfi_restore 7*8, r10 165 movq 7*8(%rsp), %r10
169 movq_cfi_restore 8*8, r9 166 movq 8*8(%rsp), %r9
170 movq_cfi_restore 9*8, r8 167 movq 9*8(%rsp), %r8
171 .endif 168 .endif
172 .if \rstor_rax 169 .if \rstor_rax
173 movq_cfi_restore 10*8, rax 170 movq 10*8(%rsp), %rax
174 .endif 171 .endif
175 .if \rstor_rcx 172 .if \rstor_rcx
176 movq_cfi_restore 11*8, rcx 173 movq 11*8(%rsp), %rcx
177 .endif 174 .endif
178 .if \rstor_rdx 175 .if \rstor_rdx
179 movq_cfi_restore 12*8, rdx 176 movq 12*8(%rsp), %rdx
180 .endif 177 .endif
181 movq_cfi_restore 13*8, rsi 178 movq 13*8(%rsp), %rsi
182 movq_cfi_restore 14*8, rdi 179 movq 14*8(%rsp), %rdi
183 .endm 180 .endm
184 .macro RESTORE_C_REGS 181 .macro RESTORE_C_REGS
185 RESTORE_C_REGS_HELPER 1,1,1,1,1 182 RESTORE_C_REGS_HELPER 1,1,1,1,1
@@ -204,8 +201,7 @@ For 32-bit we have the following conventions - kernel is built with
204 .endm 201 .endm
205 202
206 .macro REMOVE_PT_GPREGS_FROM_STACK addskip=0 203 .macro REMOVE_PT_GPREGS_FROM_STACK addskip=0
207 addq $15*8+\addskip, %rsp 204 subq $-(15*8+\addskip), %rsp
208 CFI_ADJUST_CFA_OFFSET -(15*8+\addskip)
209 .endm 205 .endm
210 206
211 .macro icebp 207 .macro icebp
@@ -224,23 +220,23 @@ For 32-bit we have the following conventions - kernel is built with
224 */ 220 */
225 221
226 .macro SAVE_ALL 222 .macro SAVE_ALL
227 pushl_cfi_reg eax 223 pushl %eax
228 pushl_cfi_reg ebp 224 pushl %ebp
229 pushl_cfi_reg edi 225 pushl %edi
230 pushl_cfi_reg esi 226 pushl %esi
231 pushl_cfi_reg edx 227 pushl %edx
232 pushl_cfi_reg ecx 228 pushl %ecx
233 pushl_cfi_reg ebx 229 pushl %ebx
234 .endm 230 .endm
235 231
236 .macro RESTORE_ALL 232 .macro RESTORE_ALL
237 popl_cfi_reg ebx 233 popl %ebx
238 popl_cfi_reg ecx 234 popl %ecx
239 popl_cfi_reg edx 235 popl %edx
240 popl_cfi_reg esi 236 popl %esi
241 popl_cfi_reg edi 237 popl %edi
242 popl_cfi_reg ebp 238 popl %ebp
243 popl_cfi_reg eax 239 popl %eax
244 .endm 240 .endm
245 241
246#endif /* CONFIG_X86_64 */ 242#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
new file mode 100644
index 000000000000..21dc60a60b5f
--- /dev/null
+++ b/arch/x86/entry/entry_32.S
@@ -0,0 +1,1248 @@
1/*
2 * Copyright (C) 1991,1992 Linus Torvalds
3 *
4 * entry_32.S contains the system-call and low-level fault and trap handling routines.
5 *
6 * Stack layout in 'syscall_exit':
7 * ptrace needs to have all registers on the stack.
8 * If the order here is changed, it needs to be
9 * updated in fork.c:copy_process(), signal.c:do_signal(),
10 * ptrace.c and ptrace.h
11 *
12 * 0(%esp) - %ebx
13 * 4(%esp) - %ecx
14 * 8(%esp) - %edx
15 * C(%esp) - %esi
16 * 10(%esp) - %edi
17 * 14(%esp) - %ebp
18 * 18(%esp) - %eax
19 * 1C(%esp) - %ds
20 * 20(%esp) - %es
21 * 24(%esp) - %fs
22 * 28(%esp) - %gs saved iff !CONFIG_X86_32_LAZY_GS
23 * 2C(%esp) - orig_eax
24 * 30(%esp) - %eip
25 * 34(%esp) - %cs
26 * 38(%esp) - %eflags
27 * 3C(%esp) - %oldesp
28 * 40(%esp) - %oldss
29 */
30
31#include <linux/linkage.h>
32#include <linux/err.h>
33#include <asm/thread_info.h>
34#include <asm/irqflags.h>
35#include <asm/errno.h>
36#include <asm/segment.h>
37#include <asm/smp.h>
38#include <asm/page_types.h>
39#include <asm/percpu.h>
40#include <asm/processor-flags.h>
41#include <asm/ftrace.h>
42#include <asm/irq_vectors.h>
43#include <asm/cpufeature.h>
44#include <asm/alternative-asm.h>
45#include <asm/asm.h>
46#include <asm/smap.h>
47
48/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
49#include <linux/elf-em.h>
50#define AUDIT_ARCH_I386 (EM_386|__AUDIT_ARCH_LE)
51#define __AUDIT_ARCH_LE 0x40000000
52
53#ifndef CONFIG_AUDITSYSCALL
54# define sysenter_audit syscall_trace_entry
55# define sysexit_audit syscall_exit_work
56#endif
57
58 .section .entry.text, "ax"
59
60/*
61 * We use macros for low-level operations which need to be overridden
62 * for paravirtualization. The following will never clobber any registers:
63 * INTERRUPT_RETURN (aka. "iret")
64 * GET_CR0_INTO_EAX (aka. "movl %cr0, %eax")
65 * ENABLE_INTERRUPTS_SYSEXIT (aka "sti; sysexit").
66 *
67 * For DISABLE_INTERRUPTS/ENABLE_INTERRUPTS (aka "cli"/"sti"), you must
68 * specify what registers can be overwritten (CLBR_NONE, CLBR_EAX/EDX/ECX/ANY).
69 * Allowing a register to be clobbered can shrink the paravirt replacement
70 * enough to patch inline, increasing performance.
71 */
72
73#ifdef CONFIG_PREEMPT
74# define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
75#else
76# define preempt_stop(clobbers)
77# define resume_kernel restore_all
78#endif
79
80.macro TRACE_IRQS_IRET
81#ifdef CONFIG_TRACE_IRQFLAGS
82 testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off?
83 jz 1f
84 TRACE_IRQS_ON
851:
86#endif
87.endm
88
89/*
90 * User gs save/restore
91 *
92 * %gs is used for userland TLS and kernel only uses it for stack
93 * canary which is required to be at %gs:20 by gcc. Read the comment
94 * at the top of stackprotector.h for more info.
95 *
96 * Local labels 98 and 99 are used.
97 */
98#ifdef CONFIG_X86_32_LAZY_GS
99
100 /* unfortunately push/pop can't be no-op */
101.macro PUSH_GS
102 pushl $0
103.endm
104.macro POP_GS pop=0
105 addl $(4 + \pop), %esp
106.endm
107.macro POP_GS_EX
108.endm
109
110 /* all the rest are no-op */
111.macro PTGS_TO_GS
112.endm
113.macro PTGS_TO_GS_EX
114.endm
115.macro GS_TO_REG reg
116.endm
117.macro REG_TO_PTGS reg
118.endm
119.macro SET_KERNEL_GS reg
120.endm
121
122#else /* CONFIG_X86_32_LAZY_GS */
123
124.macro PUSH_GS
125 pushl %gs
126.endm
127
128.macro POP_GS pop=0
12998: popl %gs
130 .if \pop <> 0
131 add $\pop, %esp
132 .endif
133.endm
134.macro POP_GS_EX
135.pushsection .fixup, "ax"
13699: movl $0, (%esp)
137 jmp 98b
138.popsection
139 _ASM_EXTABLE(98b, 99b)
140.endm
141
142.macro PTGS_TO_GS
14398: mov PT_GS(%esp), %gs
144.endm
145.macro PTGS_TO_GS_EX
146.pushsection .fixup, "ax"
14799: movl $0, PT_GS(%esp)
148 jmp 98b
149.popsection
150 _ASM_EXTABLE(98b, 99b)
151.endm
152
153.macro GS_TO_REG reg
154 movl %gs, \reg
155.endm
156.macro REG_TO_PTGS reg
157 movl \reg, PT_GS(%esp)
158.endm
159.macro SET_KERNEL_GS reg
160 movl $(__KERNEL_STACK_CANARY), \reg
161 movl \reg, %gs
162.endm
163
164#endif /* CONFIG_X86_32_LAZY_GS */
165
166.macro SAVE_ALL
167 cld
168 PUSH_GS
169 pushl %fs
170 pushl %es
171 pushl %ds
172 pushl %eax
173 pushl %ebp
174 pushl %edi
175 pushl %esi
176 pushl %edx
177 pushl %ecx
178 pushl %ebx
179 movl $(__USER_DS), %edx
180 movl %edx, %ds
181 movl %edx, %es
182 movl $(__KERNEL_PERCPU), %edx
183 movl %edx, %fs
184 SET_KERNEL_GS %edx
185.endm
186
187.macro RESTORE_INT_REGS
188 popl %ebx
189 popl %ecx
190 popl %edx
191 popl %esi
192 popl %edi
193 popl %ebp
194 popl %eax
195.endm
196
197.macro RESTORE_REGS pop=0
198 RESTORE_INT_REGS
1991: popl %ds
2002: popl %es
2013: popl %fs
202 POP_GS \pop
203.pushsection .fixup, "ax"
2044: movl $0, (%esp)
205 jmp 1b
2065: movl $0, (%esp)
207 jmp 2b
2086: movl $0, (%esp)
209 jmp 3b
210.popsection
211 _ASM_EXTABLE(1b, 4b)
212 _ASM_EXTABLE(2b, 5b)
213 _ASM_EXTABLE(3b, 6b)
214 POP_GS_EX
215.endm
216
217ENTRY(ret_from_fork)
218 pushl %eax
219 call schedule_tail
220 GET_THREAD_INFO(%ebp)
221 popl %eax
222 pushl $0x0202 # Reset kernel eflags
223 popfl
224 jmp syscall_exit
225END(ret_from_fork)
226
227ENTRY(ret_from_kernel_thread)
228 pushl %eax
229 call schedule_tail
230 GET_THREAD_INFO(%ebp)
231 popl %eax
232 pushl $0x0202 # Reset kernel eflags
233 popfl
234 movl PT_EBP(%esp), %eax
235 call *PT_EBX(%esp)
236 movl $0, PT_EAX(%esp)
237 jmp syscall_exit
238ENDPROC(ret_from_kernel_thread)
239
240/*
241 * Return to user mode is not as complex as all this looks,
242 * but we want the default path for a system call return to
243 * go as quickly as possible which is why some of this is
244 * less clear than it otherwise should be.
245 */
246
247 # userspace resumption stub bypassing syscall exit tracing
248 ALIGN
249ret_from_exception:
250 preempt_stop(CLBR_ANY)
251ret_from_intr:
252 GET_THREAD_INFO(%ebp)
253#ifdef CONFIG_VM86
254 movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
255 movb PT_CS(%esp), %al
256 andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
257#else
258 /*
259 * We can be coming here from child spawned by kernel_thread().
260 */
261 movl PT_CS(%esp), %eax
262 andl $SEGMENT_RPL_MASK, %eax
263#endif
264 cmpl $USER_RPL, %eax
265 jb resume_kernel # not returning to v8086 or userspace
266
267ENTRY(resume_userspace)
268 LOCKDEP_SYS_EXIT
269 DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
270 # setting need_resched or sigpending
271 # between sampling and the iret
272 TRACE_IRQS_OFF
273 movl TI_flags(%ebp), %ecx
274 andl $_TIF_WORK_MASK, %ecx # is there any work to be done on
275 # int/exception return?
276 jne work_pending
277 jmp restore_all
278END(ret_from_exception)
279
280#ifdef CONFIG_PREEMPT
281ENTRY(resume_kernel)
282 DISABLE_INTERRUPTS(CLBR_ANY)
283need_resched:
284 cmpl $0, PER_CPU_VAR(__preempt_count)
285 jnz restore_all
286 testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ?
287 jz restore_all
288 call preempt_schedule_irq
289 jmp need_resched
290END(resume_kernel)
291#endif
292
293/*
294 * SYSENTER_RETURN points to after the SYSENTER instruction
295 * in the vsyscall page. See vsyscall-sysentry.S, which defines
296 * the symbol.
297 */
298
299 # SYSENTER call handler stub
300ENTRY(entry_SYSENTER_32)
301 movl TSS_sysenter_sp0(%esp), %esp
302sysenter_past_esp:
303 /*
304 * Interrupts are disabled here, but we can't trace it until
305 * enough kernel state to call TRACE_IRQS_OFF can be called - but
306 * we immediately enable interrupts at that point anyway.
307 */
308 pushl $__USER_DS
309 pushl %ebp
310 pushfl
311 orl $X86_EFLAGS_IF, (%esp)
312 pushl $__USER_CS
313 /*
314 * Push current_thread_info()->sysenter_return to the stack.
315 * A tiny bit of offset fixup is necessary: TI_sysenter_return
316 * is relative to thread_info, which is at the bottom of the
317 * kernel stack page. 4*4 means the 4 words pushed above;
318 * TOP_OF_KERNEL_STACK_PADDING takes us to the top of the stack;
319 * and THREAD_SIZE takes us to the bottom.
320 */
321 pushl ((TI_sysenter_return) - THREAD_SIZE + TOP_OF_KERNEL_STACK_PADDING + 4*4)(%esp)
322
323 pushl %eax
324 SAVE_ALL
325 ENABLE_INTERRUPTS(CLBR_NONE)
326
327/*
328 * Load the potential sixth argument from user stack.
329 * Careful about security.
330 */
331 cmpl $__PAGE_OFFSET-3, %ebp
332 jae syscall_fault
333 ASM_STAC
3341: movl (%ebp), %ebp
335 ASM_CLAC
336 movl %ebp, PT_EBP(%esp)
337 _ASM_EXTABLE(1b, syscall_fault)
338
339 GET_THREAD_INFO(%ebp)
340
341 testl $_TIF_WORK_SYSCALL_ENTRY, TI_flags(%ebp)
342 jnz sysenter_audit
343sysenter_do_call:
344 cmpl $(NR_syscalls), %eax
345 jae sysenter_badsys
346 call *sys_call_table(, %eax, 4)
347sysenter_after_call:
348 movl %eax, PT_EAX(%esp)
349 LOCKDEP_SYS_EXIT
350 DISABLE_INTERRUPTS(CLBR_ANY)
351 TRACE_IRQS_OFF
352 movl TI_flags(%ebp), %ecx
353 testl $_TIF_ALLWORK_MASK, %ecx
354 jnz sysexit_audit
355sysenter_exit:
356/* if something modifies registers it must also disable sysexit */
357 movl PT_EIP(%esp), %edx
358 movl PT_OLDESP(%esp), %ecx
359 xorl %ebp, %ebp
360 TRACE_IRQS_ON
3611: mov PT_FS(%esp), %fs
362 PTGS_TO_GS
363 ENABLE_INTERRUPTS_SYSEXIT
364
365#ifdef CONFIG_AUDITSYSCALL
366sysenter_audit:
367 testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), TI_flags(%ebp)
368 jnz syscall_trace_entry
369 /* movl PT_EAX(%esp), %eax already set, syscall number: 1st arg to audit */
370 movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
371 /* movl PT_ECX(%esp), %ecx already set, a1: 3nd arg to audit */
372 pushl PT_ESI(%esp) /* a3: 5th arg */
373 pushl PT_EDX+4(%esp) /* a2: 4th arg */
374 call __audit_syscall_entry
375 popl %ecx /* get that remapped edx off the stack */
376 popl %ecx /* get that remapped esi off the stack */
377 movl PT_EAX(%esp), %eax /* reload syscall number */
378 jmp sysenter_do_call
379
380sysexit_audit:
381 testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
382 jnz syscall_exit_work
383 TRACE_IRQS_ON
384 ENABLE_INTERRUPTS(CLBR_ANY)
385 movl %eax, %edx /* second arg, syscall return value */
386 cmpl $-MAX_ERRNO, %eax /* is it an error ? */
387 setbe %al /* 1 if so, 0 if not */
388 movzbl %al, %eax /* zero-extend that */
389 call __audit_syscall_exit
390 DISABLE_INTERRUPTS(CLBR_ANY)
391 TRACE_IRQS_OFF
392 movl TI_flags(%ebp), %ecx
393 testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
394 jnz syscall_exit_work
395 movl PT_EAX(%esp), %eax /* reload syscall return value */
396 jmp sysenter_exit
397#endif
398
399.pushsection .fixup, "ax"
4002: movl $0, PT_FS(%esp)
401 jmp 1b
402.popsection
403 _ASM_EXTABLE(1b, 2b)
404 PTGS_TO_GS_EX
405ENDPROC(entry_SYSENTER_32)
406
407 # system call handler stub
408ENTRY(entry_INT80_32)
409 ASM_CLAC
410 pushl %eax # save orig_eax
411 SAVE_ALL
412 GET_THREAD_INFO(%ebp)
413 # system call tracing in operation / emulation
414 testl $_TIF_WORK_SYSCALL_ENTRY, TI_flags(%ebp)
415 jnz syscall_trace_entry
416 cmpl $(NR_syscalls), %eax
417 jae syscall_badsys
418syscall_call:
419 call *sys_call_table(, %eax, 4)
420syscall_after_call:
421 movl %eax, PT_EAX(%esp) # store the return value
422syscall_exit:
423 LOCKDEP_SYS_EXIT
424 DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
425 # setting need_resched or sigpending
426 # between sampling and the iret
427 TRACE_IRQS_OFF
428 movl TI_flags(%ebp), %ecx
429 testl $_TIF_ALLWORK_MASK, %ecx # current->work
430 jnz syscall_exit_work
431
432restore_all:
433 TRACE_IRQS_IRET
434restore_all_notrace:
435#ifdef CONFIG_X86_ESPFIX32
436 movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS
437 /*
438 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
439 * are returning to the kernel.
440 * See comments in process.c:copy_thread() for details.
441 */
442 movb PT_OLDSS(%esp), %ah
443 movb PT_CS(%esp), %al
444 andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
445 cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
446 je ldt_ss # returning to user-space with LDT SS
447#endif
448restore_nocheck:
449 RESTORE_REGS 4 # skip orig_eax/error_code
450irq_return:
451 INTERRUPT_RETURN
452.section .fixup, "ax"
453ENTRY(iret_exc )
454 pushl $0 # no error code
455 pushl $do_iret_error
456 jmp error_code
457.previous
458 _ASM_EXTABLE(irq_return, iret_exc)
459
460#ifdef CONFIG_X86_ESPFIX32
461ldt_ss:
462#ifdef CONFIG_PARAVIRT
463 /*
464 * The kernel can't run on a non-flat stack if paravirt mode
465 * is active. Rather than try to fixup the high bits of
466 * ESP, bypass this code entirely. This may break DOSemu
467 * and/or Wine support in a paravirt VM, although the option
468 * is still available to implement the setting of the high
469 * 16-bits in the INTERRUPT_RETURN paravirt-op.
470 */
471 cmpl $0, pv_info+PARAVIRT_enabled
472 jne restore_nocheck
473#endif
474
475/*
476 * Setup and switch to ESPFIX stack
477 *
478 * We're returning to userspace with a 16 bit stack. The CPU will not
479 * restore the high word of ESP for us on executing iret... This is an
480 * "official" bug of all the x86-compatible CPUs, which we can work
481 * around to make dosemu and wine happy. We do this by preloading the
482 * high word of ESP with the high word of the userspace ESP while
483 * compensating for the offset by changing to the ESPFIX segment with
484 * a base address that matches for the difference.
485 */
486#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
487 mov %esp, %edx /* load kernel esp */
488 mov PT_OLDESP(%esp), %eax /* load userspace esp */
489 mov %dx, %ax /* eax: new kernel esp */
490 sub %eax, %edx /* offset (low word is 0) */
491 shr $16, %edx
492 mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */
493 mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */
494 pushl $__ESPFIX_SS
495 pushl %eax /* new kernel esp */
496 /*
497 * Disable interrupts, but do not irqtrace this section: we
498 * will soon execute iret and the tracer was already set to
499 * the irqstate after the IRET:
500 */
501 DISABLE_INTERRUPTS(CLBR_EAX)
502 lss (%esp), %esp /* switch to espfix segment */
503 jmp restore_nocheck
504#endif
505ENDPROC(entry_INT80_32)
506
507 # perform work that needs to be done immediately before resumption
508 ALIGN
509work_pending:
510 testb $_TIF_NEED_RESCHED, %cl
511 jz work_notifysig
512work_resched:
513 call schedule
514 LOCKDEP_SYS_EXIT
515 DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
516 # setting need_resched or sigpending
517 # between sampling and the iret
518 TRACE_IRQS_OFF
519 movl TI_flags(%ebp), %ecx
520 andl $_TIF_WORK_MASK, %ecx # is there any work to be done other
521 # than syscall tracing?
522 jz restore_all
523 testb $_TIF_NEED_RESCHED, %cl
524 jnz work_resched
525
526work_notifysig: # deal with pending signals and
527 # notify-resume requests
528#ifdef CONFIG_VM86
529 testl $X86_EFLAGS_VM, PT_EFLAGS(%esp)
530 movl %esp, %eax
531 jnz work_notifysig_v86 # returning to kernel-space or
532 # vm86-space
5331:
534#else
535 movl %esp, %eax
536#endif
537 TRACE_IRQS_ON
538 ENABLE_INTERRUPTS(CLBR_NONE)
539 movb PT_CS(%esp), %bl
540 andb $SEGMENT_RPL_MASK, %bl
541 cmpb $USER_RPL, %bl
542 jb resume_kernel
543 xorl %edx, %edx
544 call do_notify_resume
545 jmp resume_userspace
546
547#ifdef CONFIG_VM86
548 ALIGN
549work_notifysig_v86:
550 pushl %ecx # save ti_flags for do_notify_resume
551 call save_v86_state # %eax contains pt_regs pointer
552 popl %ecx
553 movl %eax, %esp
554 jmp 1b
555#endif
556END(work_pending)
557
558 # perform syscall exit tracing
559 ALIGN
560syscall_trace_entry:
561 movl $-ENOSYS, PT_EAX(%esp)
562 movl %esp, %eax
563 call syscall_trace_enter
564 /* What it returned is what we'll actually use. */
565 cmpl $(NR_syscalls), %eax
566 jnae syscall_call
567 jmp syscall_exit
568END(syscall_trace_entry)
569
570 # perform syscall exit tracing
571 ALIGN
572syscall_exit_work:
573 testl $_TIF_WORK_SYSCALL_EXIT, %ecx
574 jz work_pending
575 TRACE_IRQS_ON
576 ENABLE_INTERRUPTS(CLBR_ANY) # could let syscall_trace_leave() call
577 # schedule() instead
578 movl %esp, %eax
579 call syscall_trace_leave
580 jmp resume_userspace
581END(syscall_exit_work)
582
583syscall_fault:
584 ASM_CLAC
585 GET_THREAD_INFO(%ebp)
586 movl $-EFAULT, PT_EAX(%esp)
587 jmp resume_userspace
588END(syscall_fault)
589
590syscall_badsys:
591 movl $-ENOSYS, %eax
592 jmp syscall_after_call
593END(syscall_badsys)
594
595sysenter_badsys:
596 movl $-ENOSYS, %eax
597 jmp sysenter_after_call
598END(sysenter_badsys)
599
600.macro FIXUP_ESPFIX_STACK
601/*
602 * Switch back for ESPFIX stack to the normal zerobased stack
603 *
604 * We can't call C functions using the ESPFIX stack. This code reads
605 * the high word of the segment base from the GDT and swiches to the
606 * normal stack and adjusts ESP with the matching offset.
607 */
608#ifdef CONFIG_X86_ESPFIX32
609 /* fixup the stack */
610 mov GDT_ESPFIX_SS + 4, %al /* bits 16..23 */
611 mov GDT_ESPFIX_SS + 7, %ah /* bits 24..31 */
612 shl $16, %eax
613 addl %esp, %eax /* the adjusted stack pointer */
614 pushl $__KERNEL_DS
615 pushl %eax
616 lss (%esp), %esp /* switch to the normal stack segment */
617#endif
618.endm
619.macro UNWIND_ESPFIX_STACK
620#ifdef CONFIG_X86_ESPFIX32
621 movl %ss, %eax
622 /* see if on espfix stack */
623 cmpw $__ESPFIX_SS, %ax
624 jne 27f
625 movl $__KERNEL_DS, %eax
626 movl %eax, %ds
627 movl %eax, %es
628 /* switch to normal stack */
629 FIXUP_ESPFIX_STACK
63027:
631#endif
632.endm
633
634/*
635 * Build the entry stubs with some assembler magic.
636 * We pack 1 stub into every 8-byte block.
637 */
638 .align 8
639ENTRY(irq_entries_start)
640 vector=FIRST_EXTERNAL_VECTOR
641 .rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
642 pushl $(~vector+0x80) /* Note: always in signed byte range */
643 vector=vector+1
644 jmp common_interrupt
645 .align 8
646 .endr
647END(irq_entries_start)
648
649/*
650 * the CPU automatically disables interrupts when executing an IRQ vector,
651 * so IRQ-flags tracing has to follow that:
652 */
653 .p2align CONFIG_X86_L1_CACHE_SHIFT
654common_interrupt:
655 ASM_CLAC
656 addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */
657 SAVE_ALL
658 TRACE_IRQS_OFF
659 movl %esp, %eax
660 call do_IRQ
661 jmp ret_from_intr
662ENDPROC(common_interrupt)
663
664#define BUILD_INTERRUPT3(name, nr, fn) \
665ENTRY(name) \
666 ASM_CLAC; \
667 pushl $~(nr); \
668 SAVE_ALL; \
669 TRACE_IRQS_OFF \
670 movl %esp, %eax; \
671 call fn; \
672 jmp ret_from_intr; \
673ENDPROC(name)
674
675
676#ifdef CONFIG_TRACING
677# define TRACE_BUILD_INTERRUPT(name, nr) BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name)
678#else
679# define TRACE_BUILD_INTERRUPT(name, nr)
680#endif
681
682#define BUILD_INTERRUPT(name, nr) \
683 BUILD_INTERRUPT3(name, nr, smp_##name); \
684 TRACE_BUILD_INTERRUPT(name, nr)
685
686/* The include is where all of the SMP etc. interrupts come from */
687#include <asm/entry_arch.h>
688
689ENTRY(coprocessor_error)
690 ASM_CLAC
691 pushl $0
692 pushl $do_coprocessor_error
693 jmp error_code
694END(coprocessor_error)
695
696ENTRY(simd_coprocessor_error)
697 ASM_CLAC
698 pushl $0
699#ifdef CONFIG_X86_INVD_BUG
700 /* AMD 486 bug: invd from userspace calls exception 19 instead of #GP */
701 ALTERNATIVE "pushl $do_general_protection", \
702 "pushl $do_simd_coprocessor_error", \
703 X86_FEATURE_XMM
704#else
705 pushl $do_simd_coprocessor_error
706#endif
707 jmp error_code
708END(simd_coprocessor_error)
709
710ENTRY(device_not_available)
711 ASM_CLAC
712 pushl $-1 # mark this as an int
713 pushl $do_device_not_available
714 jmp error_code
715END(device_not_available)
716
717#ifdef CONFIG_PARAVIRT
718ENTRY(native_iret)
719 iret
720 _ASM_EXTABLE(native_iret, iret_exc)
721END(native_iret)
722
723ENTRY(native_irq_enable_sysexit)
724 sti
725 sysexit
726END(native_irq_enable_sysexit)
727#endif
728
729ENTRY(overflow)
730 ASM_CLAC
731 pushl $0
732 pushl $do_overflow
733 jmp error_code
734END(overflow)
735
736ENTRY(bounds)
737 ASM_CLAC
738 pushl $0
739 pushl $do_bounds
740 jmp error_code
741END(bounds)
742
743ENTRY(invalid_op)
744 ASM_CLAC
745 pushl $0
746 pushl $do_invalid_op
747 jmp error_code
748END(invalid_op)
749
750ENTRY(coprocessor_segment_overrun)
751 ASM_CLAC
752 pushl $0
753 pushl $do_coprocessor_segment_overrun
754 jmp error_code
755END(coprocessor_segment_overrun)
756
757ENTRY(invalid_TSS)
758 ASM_CLAC
759 pushl $do_invalid_TSS
760 jmp error_code
761END(invalid_TSS)
762
763ENTRY(segment_not_present)
764 ASM_CLAC
765 pushl $do_segment_not_present
766 jmp error_code
767END(segment_not_present)
768
769ENTRY(stack_segment)
770 ASM_CLAC
771 pushl $do_stack_segment
772 jmp error_code
773END(stack_segment)
774
775ENTRY(alignment_check)
776 ASM_CLAC
777 pushl $do_alignment_check
778 jmp error_code
779END(alignment_check)
780
781ENTRY(divide_error)
782 ASM_CLAC
783 pushl $0 # no error code
784 pushl $do_divide_error
785 jmp error_code
786END(divide_error)
787
788#ifdef CONFIG_X86_MCE
789ENTRY(machine_check)
790 ASM_CLAC
791 pushl $0
792 pushl machine_check_vector
793 jmp error_code
794END(machine_check)
795#endif
796
797ENTRY(spurious_interrupt_bug)
798 ASM_CLAC
799 pushl $0
800 pushl $do_spurious_interrupt_bug
801 jmp error_code
802END(spurious_interrupt_bug)
803
804#ifdef CONFIG_XEN
805/*
806 * Xen doesn't set %esp to be precisely what the normal SYSENTER
807 * entry point expects, so fix it up before using the normal path.
808 */
809ENTRY(xen_sysenter_target)
810 addl $5*4, %esp /* remove xen-provided frame */
811 jmp sysenter_past_esp
812
813ENTRY(xen_hypervisor_callback)
814 pushl $-1 /* orig_ax = -1 => not a system call */
815 SAVE_ALL
816 TRACE_IRQS_OFF
817
818 /*
819 * Check to see if we got the event in the critical
820 * region in xen_iret_direct, after we've reenabled
821 * events and checked for pending events. This simulates
822 * iret instruction's behaviour where it delivers a
823 * pending interrupt when enabling interrupts:
824 */
825 movl PT_EIP(%esp), %eax
826 cmpl $xen_iret_start_crit, %eax
827 jb 1f
828 cmpl $xen_iret_end_crit, %eax
829 jae 1f
830
831 jmp xen_iret_crit_fixup
832
833ENTRY(xen_do_upcall)
8341: mov %esp, %eax
835 call xen_evtchn_do_upcall
836#ifndef CONFIG_PREEMPT
837 call xen_maybe_preempt_hcall
838#endif
839 jmp ret_from_intr
840ENDPROC(xen_hypervisor_callback)
841
842/*
843 * Hypervisor uses this for application faults while it executes.
844 * We get here for two reasons:
845 * 1. Fault while reloading DS, ES, FS or GS
846 * 2. Fault while executing IRET
847 * Category 1 we fix up by reattempting the load, and zeroing the segment
848 * register if the load fails.
849 * Category 2 we fix up by jumping to do_iret_error. We cannot use the
850 * normal Linux return path in this case because if we use the IRET hypercall
851 * to pop the stack frame we end up in an infinite loop of failsafe callbacks.
852 * We distinguish between categories by maintaining a status value in EAX.
853 */
854ENTRY(xen_failsafe_callback)
855 pushl %eax
856 movl $1, %eax
8571: mov 4(%esp), %ds
8582: mov 8(%esp), %es
8593: mov 12(%esp), %fs
8604: mov 16(%esp), %gs
861 /* EAX == 0 => Category 1 (Bad segment)
862 EAX != 0 => Category 2 (Bad IRET) */
863 testl %eax, %eax
864 popl %eax
865 lea 16(%esp), %esp
866 jz 5f
867 jmp iret_exc
8685: pushl $-1 /* orig_ax = -1 => not a system call */
869 SAVE_ALL
870 jmp ret_from_exception
871
872.section .fixup, "ax"
8736: xorl %eax, %eax
874 movl %eax, 4(%esp)
875 jmp 1b
8767: xorl %eax, %eax
877 movl %eax, 8(%esp)
878 jmp 2b
8798: xorl %eax, %eax
880 movl %eax, 12(%esp)
881 jmp 3b
8829: xorl %eax, %eax
883 movl %eax, 16(%esp)
884 jmp 4b
885.previous
886 _ASM_EXTABLE(1b, 6b)
887 _ASM_EXTABLE(2b, 7b)
888 _ASM_EXTABLE(3b, 8b)
889 _ASM_EXTABLE(4b, 9b)
890ENDPROC(xen_failsafe_callback)
891
892BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
893 xen_evtchn_do_upcall)
894
895#endif /* CONFIG_XEN */
896
897#if IS_ENABLED(CONFIG_HYPERV)
898
899BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
900 hyperv_vector_handler)
901
902#endif /* CONFIG_HYPERV */
903
904#ifdef CONFIG_FUNCTION_TRACER
905#ifdef CONFIG_DYNAMIC_FTRACE
906
907ENTRY(mcount)
908 ret
909END(mcount)
910
911ENTRY(ftrace_caller)
912 pushl %eax
913 pushl %ecx
914 pushl %edx
915 pushl $0 /* Pass NULL as regs pointer */
916 movl 4*4(%esp), %eax
917 movl 0x4(%ebp), %edx
918 movl function_trace_op, %ecx
919 subl $MCOUNT_INSN_SIZE, %eax
920
921.globl ftrace_call
922ftrace_call:
923 call ftrace_stub
924
925 addl $4, %esp /* skip NULL pointer */
926 popl %edx
927 popl %ecx
928 popl %eax
929ftrace_ret:
930#ifdef CONFIG_FUNCTION_GRAPH_TRACER
931.globl ftrace_graph_call
932ftrace_graph_call:
933 jmp ftrace_stub
934#endif
935
936.globl ftrace_stub
937ftrace_stub:
938 ret
939END(ftrace_caller)
940
941ENTRY(ftrace_regs_caller)
942 pushf /* push flags before compare (in cs location) */
943
944 /*
945 * i386 does not save SS and ESP when coming from kernel.
946 * Instead, to get sp, &regs->sp is used (see ptrace.h).
947 * Unfortunately, that means eflags must be at the same location
948 * as the current return ip is. We move the return ip into the
949 * ip location, and move flags into the return ip location.
950 */
951 pushl 4(%esp) /* save return ip into ip slot */
952
953 pushl $0 /* Load 0 into orig_ax */
954 pushl %gs
955 pushl %fs
956 pushl %es
957 pushl %ds
958 pushl %eax
959 pushl %ebp
960 pushl %edi
961 pushl %esi
962 pushl %edx
963 pushl %ecx
964 pushl %ebx
965
966 movl 13*4(%esp), %eax /* Get the saved flags */
967 movl %eax, 14*4(%esp) /* Move saved flags into regs->flags location */
968 /* clobbering return ip */
969 movl $__KERNEL_CS, 13*4(%esp)
970
971 movl 12*4(%esp), %eax /* Load ip (1st parameter) */
972 subl $MCOUNT_INSN_SIZE, %eax /* Adjust ip */
973 movl 0x4(%ebp), %edx /* Load parent ip (2nd parameter) */
974 movl function_trace_op, %ecx /* Save ftrace_pos in 3rd parameter */
975 pushl %esp /* Save pt_regs as 4th parameter */
976
977GLOBAL(ftrace_regs_call)
978 call ftrace_stub
979
980 addl $4, %esp /* Skip pt_regs */
981 movl 14*4(%esp), %eax /* Move flags back into cs */
982 movl %eax, 13*4(%esp) /* Needed to keep addl from modifying flags */
983 movl 12*4(%esp), %eax /* Get return ip from regs->ip */
984 movl %eax, 14*4(%esp) /* Put return ip back for ret */
985
986 popl %ebx
987 popl %ecx
988 popl %edx
989 popl %esi
990 popl %edi
991 popl %ebp
992 popl %eax
993 popl %ds
994 popl %es
995 popl %fs
996 popl %gs
997 addl $8, %esp /* Skip orig_ax and ip */
998 popf /* Pop flags at end (no addl to corrupt flags) */
999 jmp ftrace_ret
1000
1001 popf
1002 jmp ftrace_stub
1003#else /* ! CONFIG_DYNAMIC_FTRACE */
1004
1005ENTRY(mcount)
1006 cmpl $__PAGE_OFFSET, %esp
1007 jb ftrace_stub /* Paging not enabled yet? */
1008
1009 cmpl $ftrace_stub, ftrace_trace_function
1010 jnz trace
1011#ifdef CONFIG_FUNCTION_GRAPH_TRACER
1012 cmpl $ftrace_stub, ftrace_graph_return
1013 jnz ftrace_graph_caller
1014
1015 cmpl $ftrace_graph_entry_stub, ftrace_graph_entry
1016 jnz ftrace_graph_caller
1017#endif
1018.globl ftrace_stub
1019ftrace_stub:
1020 ret
1021
1022 /* taken from glibc */
1023trace:
1024 pushl %eax
1025 pushl %ecx
1026 pushl %edx
1027 movl 0xc(%esp), %eax
1028 movl 0x4(%ebp), %edx
1029 subl $MCOUNT_INSN_SIZE, %eax
1030
1031 call *ftrace_trace_function
1032
1033 popl %edx
1034 popl %ecx
1035 popl %eax
1036 jmp ftrace_stub
1037END(mcount)
1038#endif /* CONFIG_DYNAMIC_FTRACE */
1039#endif /* CONFIG_FUNCTION_TRACER */
1040
1041#ifdef CONFIG_FUNCTION_GRAPH_TRACER
1042ENTRY(ftrace_graph_caller)
1043 pushl %eax
1044 pushl %ecx
1045 pushl %edx
1046 movl 0xc(%esp), %eax
1047 lea 0x4(%ebp), %edx
1048 movl (%ebp), %ecx
1049 subl $MCOUNT_INSN_SIZE, %eax
1050 call prepare_ftrace_return
1051 popl %edx
1052 popl %ecx
1053 popl %eax
1054 ret
1055END(ftrace_graph_caller)
1056
1057.globl return_to_handler
1058return_to_handler:
1059 pushl %eax
1060 pushl %edx
1061 movl %ebp, %eax
1062 call ftrace_return_to_handler
1063 movl %eax, %ecx
1064 popl %edx
1065 popl %eax
1066 jmp *%ecx
1067#endif
1068
1069#ifdef CONFIG_TRACING
1070ENTRY(trace_page_fault)
1071 ASM_CLAC
1072 pushl $trace_do_page_fault
1073 jmp error_code
1074END(trace_page_fault)
1075#endif
1076
1077ENTRY(page_fault)
1078 ASM_CLAC
1079 pushl $do_page_fault
1080 ALIGN
1081error_code:
1082 /* the function address is in %gs's slot on the stack */
1083 pushl %fs
1084 pushl %es
1085 pushl %ds
1086 pushl %eax
1087 pushl %ebp
1088 pushl %edi
1089 pushl %esi
1090 pushl %edx
1091 pushl %ecx
1092 pushl %ebx
1093 cld
1094 movl $(__KERNEL_PERCPU), %ecx
1095 movl %ecx, %fs
1096 UNWIND_ESPFIX_STACK
1097 GS_TO_REG %ecx
1098 movl PT_GS(%esp), %edi # get the function address
1099 movl PT_ORIG_EAX(%esp), %edx # get the error code
1100 movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
1101 REG_TO_PTGS %ecx
1102 SET_KERNEL_GS %ecx
1103 movl $(__USER_DS), %ecx
1104 movl %ecx, %ds
1105 movl %ecx, %es
1106 TRACE_IRQS_OFF
1107 movl %esp, %eax # pt_regs pointer
1108 call *%edi
1109 jmp ret_from_exception
1110END(page_fault)
1111
1112/*
1113 * Debug traps and NMI can happen at the one SYSENTER instruction
1114 * that sets up the real kernel stack. Check here, since we can't
1115 * allow the wrong stack to be used.
1116 *
1117 * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
1118 * already pushed 3 words if it hits on the sysenter instruction:
1119 * eflags, cs and eip.
1120 *
1121 * We just load the right stack, and push the three (known) values
1122 * by hand onto the new stack - while updating the return eip past
1123 * the instruction that would have done it for sysenter.
1124 */
1125.macro FIX_STACK offset ok label
1126 cmpw $__KERNEL_CS, 4(%esp)
1127 jne \ok
1128\label:
1129 movl TSS_sysenter_sp0 + \offset(%esp), %esp
1130 pushfl
1131 pushl $__KERNEL_CS
1132 pushl $sysenter_past_esp
1133.endm
1134
1135ENTRY(debug)
1136 ASM_CLAC
1137 cmpl $entry_SYSENTER_32, (%esp)
1138 jne debug_stack_correct
1139 FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
1140debug_stack_correct:
1141 pushl $-1 # mark this as an int
1142 SAVE_ALL
1143 TRACE_IRQS_OFF
1144 xorl %edx, %edx # error code 0
1145 movl %esp, %eax # pt_regs pointer
1146 call do_debug
1147 jmp ret_from_exception
1148END(debug)
1149
1150/*
1151 * NMI is doubly nasty. It can happen _while_ we're handling
1152 * a debug fault, and the debug fault hasn't yet been able to
1153 * clear up the stack. So we first check whether we got an
1154 * NMI on the sysenter entry path, but after that we need to
1155 * check whether we got an NMI on the debug path where the debug
1156 * fault happened on the sysenter path.
1157 */
1158ENTRY(nmi)
1159 ASM_CLAC
1160#ifdef CONFIG_X86_ESPFIX32
1161 pushl %eax
1162 movl %ss, %eax
1163 cmpw $__ESPFIX_SS, %ax
1164 popl %eax
1165 je nmi_espfix_stack
1166#endif
1167 cmpl $entry_SYSENTER_32, (%esp)
1168 je nmi_stack_fixup
1169 pushl %eax
1170 movl %esp, %eax
1171 /*
1172 * Do not access memory above the end of our stack page,
1173 * it might not exist.
1174 */
1175 andl $(THREAD_SIZE-1), %eax
1176 cmpl $(THREAD_SIZE-20), %eax
1177 popl %eax
1178 jae nmi_stack_correct
1179 cmpl $entry_SYSENTER_32, 12(%esp)
1180 je nmi_debug_stack_check
1181nmi_stack_correct:
1182 pushl %eax
1183 SAVE_ALL
1184 xorl %edx, %edx # zero error code
1185 movl %esp, %eax # pt_regs pointer
1186 call do_nmi
1187 jmp restore_all_notrace
1188
1189nmi_stack_fixup:
1190 FIX_STACK 12, nmi_stack_correct, 1
1191 jmp nmi_stack_correct
1192
1193nmi_debug_stack_check:
1194 cmpw $__KERNEL_CS, 16(%esp)
1195 jne nmi_stack_correct
1196 cmpl $debug, (%esp)
1197 jb nmi_stack_correct
1198 cmpl $debug_esp_fix_insn, (%esp)
1199 ja nmi_stack_correct
1200 FIX_STACK 24, nmi_stack_correct, 1
1201 jmp nmi_stack_correct
1202
1203#ifdef CONFIG_X86_ESPFIX32
1204nmi_espfix_stack:
1205 /*
1206 * create the pointer to lss back
1207 */
1208 pushl %ss
1209 pushl %esp
1210 addl $4, (%esp)
1211 /* copy the iret frame of 12 bytes */
1212 .rept 3
1213 pushl 16(%esp)
1214 .endr
1215 pushl %eax
1216 SAVE_ALL
1217 FIXUP_ESPFIX_STACK # %eax == %esp
1218 xorl %edx, %edx # zero error code
1219 call do_nmi
1220 RESTORE_REGS
1221 lss 12+4(%esp), %esp # back to espfix stack
1222 jmp irq_return
1223#endif
1224END(nmi)
1225
1226ENTRY(int3)
1227 ASM_CLAC
1228 pushl $-1 # mark this as an int
1229 SAVE_ALL
1230 TRACE_IRQS_OFF
1231 xorl %edx, %edx # zero error code
1232 movl %esp, %eax # pt_regs pointer
1233 call do_int3
1234 jmp ret_from_exception
1235END(int3)
1236
1237ENTRY(general_protection)
1238 pushl $do_general_protection
1239 jmp error_code
1240END(general_protection)
1241
1242#ifdef CONFIG_KVM_GUEST
1243ENTRY(async_page_fault)
1244 ASM_CLAC
1245 pushl $do_async_page_fault
1246 jmp error_code
1247END(async_page_fault)
1248#endif
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/entry/entry_64.S
index 02c2eff7478d..3bb2c4302df1 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -4,34 +4,25 @@
4 * Copyright (C) 1991, 1992 Linus Torvalds 4 * Copyright (C) 1991, 1992 Linus Torvalds
5 * Copyright (C) 2000, 2001, 2002 Andi Kleen SuSE Labs 5 * Copyright (C) 2000, 2001, 2002 Andi Kleen SuSE Labs
6 * Copyright (C) 2000 Pavel Machek <pavel@suse.cz> 6 * Copyright (C) 2000 Pavel Machek <pavel@suse.cz>
7 */ 7 *
8
9/*
10 * entry.S contains the system-call and fault low-level handling routines. 8 * entry.S contains the system-call and fault low-level handling routines.
11 * 9 *
12 * Some of this is documented in Documentation/x86/entry_64.txt 10 * Some of this is documented in Documentation/x86/entry_64.txt
13 * 11 *
14 * NOTE: This code handles signal-recognition, which happens every time
15 * after an interrupt and after each system call.
16 *
17 * A note on terminology: 12 * A note on terminology:
18 * - iret frame: Architecture defined interrupt frame from SS to RIP 13 * - iret frame: Architecture defined interrupt frame from SS to RIP
19 * at the top of the kernel process stack. 14 * at the top of the kernel process stack.
20 * 15 *
21 * Some macro usage: 16 * Some macro usage:
22 * - CFI macros are used to generate dwarf2 unwind information for better 17 * - ENTRY/END: Define functions in the symbol table.
23 * backtraces. They don't change any code. 18 * - TRACE_IRQ_*: Trace hardirq state for lock debugging.
24 * - ENTRY/END Define functions in the symbol table. 19 * - idtentry: Define exception entry points.
25 * - TRACE_IRQ_* - Trace hard interrupt state for lock debugging.
26 * - idtentry - Define exception entry points.
27 */ 20 */
28
29#include <linux/linkage.h> 21#include <linux/linkage.h>
30#include <asm/segment.h> 22#include <asm/segment.h>
31#include <asm/cache.h> 23#include <asm/cache.h>
32#include <asm/errno.h> 24#include <asm/errno.h>
33#include <asm/dwarf2.h> 25#include "calling.h"
34#include <asm/calling.h>
35#include <asm/asm-offsets.h> 26#include <asm/asm-offsets.h>
36#include <asm/msr.h> 27#include <asm/msr.h>
37#include <asm/unistd.h> 28#include <asm/unistd.h>
@@ -49,13 +40,12 @@
49 40
50/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */ 41/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
51#include <linux/elf-em.h> 42#include <linux/elf-em.h>
52#define AUDIT_ARCH_X86_64 (EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE) 43#define AUDIT_ARCH_X86_64 (EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
53#define __AUDIT_ARCH_64BIT 0x80000000 44#define __AUDIT_ARCH_64BIT 0x80000000
54#define __AUDIT_ARCH_LE 0x40000000 45#define __AUDIT_ARCH_LE 0x40000000
55
56 .code64
57 .section .entry.text, "ax"
58 46
47.code64
48.section .entry.text, "ax"
59 49
60#ifdef CONFIG_PARAVIRT 50#ifdef CONFIG_PARAVIRT
61ENTRY(native_usergs_sysret64) 51ENTRY(native_usergs_sysret64)
@@ -64,11 +54,10 @@ ENTRY(native_usergs_sysret64)
64ENDPROC(native_usergs_sysret64) 54ENDPROC(native_usergs_sysret64)
65#endif /* CONFIG_PARAVIRT */ 55#endif /* CONFIG_PARAVIRT */
66 56
67
68.macro TRACE_IRQS_IRETQ 57.macro TRACE_IRQS_IRETQ
69#ifdef CONFIG_TRACE_IRQFLAGS 58#ifdef CONFIG_TRACE_IRQFLAGS
70 bt $9,EFLAGS(%rsp) /* interrupts off? */ 59 bt $9, EFLAGS(%rsp) /* interrupts off? */
71 jnc 1f 60 jnc 1f
72 TRACE_IRQS_ON 61 TRACE_IRQS_ON
731: 621:
74#endif 63#endif
@@ -88,89 +77,34 @@ ENDPROC(native_usergs_sysret64)
88#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_TRACE_IRQFLAGS) 77#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_TRACE_IRQFLAGS)
89 78
90.macro TRACE_IRQS_OFF_DEBUG 79.macro TRACE_IRQS_OFF_DEBUG
91 call debug_stack_set_zero 80 call debug_stack_set_zero
92 TRACE_IRQS_OFF 81 TRACE_IRQS_OFF
93 call debug_stack_reset 82 call debug_stack_reset
94.endm 83.endm
95 84
96.macro TRACE_IRQS_ON_DEBUG 85.macro TRACE_IRQS_ON_DEBUG
97 call debug_stack_set_zero 86 call debug_stack_set_zero
98 TRACE_IRQS_ON 87 TRACE_IRQS_ON
99 call debug_stack_reset 88 call debug_stack_reset
100.endm 89.endm
101 90
102.macro TRACE_IRQS_IRETQ_DEBUG 91.macro TRACE_IRQS_IRETQ_DEBUG
103 bt $9,EFLAGS(%rsp) /* interrupts off? */ 92 bt $9, EFLAGS(%rsp) /* interrupts off? */
104 jnc 1f 93 jnc 1f
105 TRACE_IRQS_ON_DEBUG 94 TRACE_IRQS_ON_DEBUG
1061: 951:
107.endm 96.endm
108 97
109#else 98#else
110# define TRACE_IRQS_OFF_DEBUG TRACE_IRQS_OFF 99# define TRACE_IRQS_OFF_DEBUG TRACE_IRQS_OFF
111# define TRACE_IRQS_ON_DEBUG TRACE_IRQS_ON 100# define TRACE_IRQS_ON_DEBUG TRACE_IRQS_ON
112# define TRACE_IRQS_IRETQ_DEBUG TRACE_IRQS_IRETQ 101# define TRACE_IRQS_IRETQ_DEBUG TRACE_IRQS_IRETQ
113#endif 102#endif
114 103
115/* 104/*
116 * empty frame 105 * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
117 */
118 .macro EMPTY_FRAME start=1 offset=0
119 .if \start
120 CFI_STARTPROC simple
121 CFI_SIGNAL_FRAME
122 CFI_DEF_CFA rsp,8+\offset
123 .else
124 CFI_DEF_CFA_OFFSET 8+\offset
125 .endif
126 .endm
127
128/*
129 * initial frame state for interrupts (and exceptions without error code)
130 */
131 .macro INTR_FRAME start=1 offset=0
132 EMPTY_FRAME \start, 5*8+\offset
133 /*CFI_REL_OFFSET ss, 4*8+\offset*/
134 CFI_REL_OFFSET rsp, 3*8+\offset
135 /*CFI_REL_OFFSET rflags, 2*8+\offset*/
136 /*CFI_REL_OFFSET cs, 1*8+\offset*/
137 CFI_REL_OFFSET rip, 0*8+\offset
138 .endm
139
140/*
141 * initial frame state for exceptions with error code (and interrupts
142 * with vector already pushed)
143 */
144 .macro XCPT_FRAME start=1 offset=0
145 INTR_FRAME \start, 1*8+\offset
146 .endm
147
148/*
149 * frame that enables passing a complete pt_regs to a C function.
150 */
151 .macro DEFAULT_FRAME start=1 offset=0
152 XCPT_FRAME \start, ORIG_RAX+\offset
153 CFI_REL_OFFSET rdi, RDI+\offset
154 CFI_REL_OFFSET rsi, RSI+\offset
155 CFI_REL_OFFSET rdx, RDX+\offset
156 CFI_REL_OFFSET rcx, RCX+\offset
157 CFI_REL_OFFSET rax, RAX+\offset
158 CFI_REL_OFFSET r8, R8+\offset
159 CFI_REL_OFFSET r9, R9+\offset
160 CFI_REL_OFFSET r10, R10+\offset
161 CFI_REL_OFFSET r11, R11+\offset
162 CFI_REL_OFFSET rbx, RBX+\offset
163 CFI_REL_OFFSET rbp, RBP+\offset
164 CFI_REL_OFFSET r12, R12+\offset
165 CFI_REL_OFFSET r13, R13+\offset
166 CFI_REL_OFFSET r14, R14+\offset
167 CFI_REL_OFFSET r15, R15+\offset
168 .endm
169
170/*
171 * 64bit SYSCALL instruction entry. Up to 6 arguments in registers.
172 * 106 *
173 * 64bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11, 107 * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
174 * then loads new ss, cs, and rip from previously programmed MSRs. 108 * then loads new ss, cs, and rip from previously programmed MSRs.
175 * rflags gets masked by a value from another MSR (so CLD and CLAC 109 * rflags gets masked by a value from another MSR (so CLD and CLAC
176 * are not needed). SYSCALL does not save anything on the stack 110 * are not needed). SYSCALL does not save anything on the stack
@@ -186,7 +120,7 @@ ENDPROC(native_usergs_sysret64)
186 * r10 arg3 (needs to be moved to rcx to conform to C ABI) 120 * r10 arg3 (needs to be moved to rcx to conform to C ABI)
187 * r8 arg4 121 * r8 arg4
188 * r9 arg5 122 * r9 arg5
189 * (note: r12-r15,rbp,rbx are callee-preserved in C ABI) 123 * (note: r12-r15, rbp, rbx are callee-preserved in C ABI)
190 * 124 *
191 * Only called from user space. 125 * Only called from user space.
192 * 126 *
@@ -195,13 +129,7 @@ ENDPROC(native_usergs_sysret64)
195 * with them due to bugs in both AMD and Intel CPUs. 129 * with them due to bugs in both AMD and Intel CPUs.
196 */ 130 */
197 131
198ENTRY(system_call) 132ENTRY(entry_SYSCALL_64)
199 CFI_STARTPROC simple
200 CFI_SIGNAL_FRAME
201 CFI_DEF_CFA rsp,0
202 CFI_REGISTER rip,rcx
203 /*CFI_REGISTER rflags,r11*/
204
205 /* 133 /*
206 * Interrupts are off on entry. 134 * Interrupts are off on entry.
207 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON, 135 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
@@ -213,14 +141,14 @@ ENTRY(system_call)
213 * after the swapgs, so that it can do the swapgs 141 * after the swapgs, so that it can do the swapgs
214 * for the guest and jump here on syscall. 142 * for the guest and jump here on syscall.
215 */ 143 */
216GLOBAL(system_call_after_swapgs) 144GLOBAL(entry_SYSCALL_64_after_swapgs)
217 145
218 movq %rsp,PER_CPU_VAR(rsp_scratch) 146 movq %rsp, PER_CPU_VAR(rsp_scratch)
219 movq PER_CPU_VAR(kernel_stack),%rsp 147 movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
220 148
221 /* Construct struct pt_regs on stack */ 149 /* Construct struct pt_regs on stack */
222 pushq_cfi $__USER_DS /* pt_regs->ss */ 150 pushq $__USER_DS /* pt_regs->ss */
223 pushq_cfi PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */ 151 pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
224 /* 152 /*
225 * Re-enable interrupts. 153 * Re-enable interrupts.
226 * We use 'rsp_scratch' as a scratch space, hence irq-off block above 154 * We use 'rsp_scratch' as a scratch space, hence irq-off block above
@@ -229,36 +157,34 @@ GLOBAL(system_call_after_swapgs)
229 * with using rsp_scratch: 157 * with using rsp_scratch:
230 */ 158 */
231 ENABLE_INTERRUPTS(CLBR_NONE) 159 ENABLE_INTERRUPTS(CLBR_NONE)
232 pushq_cfi %r11 /* pt_regs->flags */ 160 pushq %r11 /* pt_regs->flags */
233 pushq_cfi $__USER_CS /* pt_regs->cs */ 161 pushq $__USER_CS /* pt_regs->cs */
234 pushq_cfi %rcx /* pt_regs->ip */ 162 pushq %rcx /* pt_regs->ip */
235 CFI_REL_OFFSET rip,0 163 pushq %rax /* pt_regs->orig_ax */
236 pushq_cfi_reg rax /* pt_regs->orig_ax */ 164 pushq %rdi /* pt_regs->di */
237 pushq_cfi_reg rdi /* pt_regs->di */ 165 pushq %rsi /* pt_regs->si */
238 pushq_cfi_reg rsi /* pt_regs->si */ 166 pushq %rdx /* pt_regs->dx */
239 pushq_cfi_reg rdx /* pt_regs->dx */ 167 pushq %rcx /* pt_regs->cx */
240 pushq_cfi_reg rcx /* pt_regs->cx */ 168 pushq $-ENOSYS /* pt_regs->ax */
241 pushq_cfi $-ENOSYS /* pt_regs->ax */ 169 pushq %r8 /* pt_regs->r8 */
242 pushq_cfi_reg r8 /* pt_regs->r8 */ 170 pushq %r9 /* pt_regs->r9 */
243 pushq_cfi_reg r9 /* pt_regs->r9 */ 171 pushq %r10 /* pt_regs->r10 */
244 pushq_cfi_reg r10 /* pt_regs->r10 */ 172 pushq %r11 /* pt_regs->r11 */
245 pushq_cfi_reg r11 /* pt_regs->r11 */ 173 sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
246 sub $(6*8),%rsp /* pt_regs->bp,bx,r12-15 not saved */ 174
247 CFI_ADJUST_CFA_OFFSET 6*8 175 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
248 176 jnz tracesys
249 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS) 177entry_SYSCALL_64_fastpath:
250 jnz tracesys
251system_call_fastpath:
252#if __SYSCALL_MASK == ~0 178#if __SYSCALL_MASK == ~0
253 cmpq $__NR_syscall_max,%rax 179 cmpq $__NR_syscall_max, %rax
254#else 180#else
255 andl $__SYSCALL_MASK,%eax 181 andl $__SYSCALL_MASK, %eax
256 cmpl $__NR_syscall_max,%eax 182 cmpl $__NR_syscall_max, %eax
257#endif 183#endif
258 ja 1f /* return -ENOSYS (already in pt_regs->ax) */ 184 ja 1f /* return -ENOSYS (already in pt_regs->ax) */
259 movq %r10,%rcx 185 movq %r10, %rcx
260 call *sys_call_table(,%rax,8) 186 call *sys_call_table(, %rax, 8)
261 movq %rax,RAX(%rsp) 187 movq %rax, RAX(%rsp)
2621: 1881:
263/* 189/*
264 * Syscall return path ending with SYSRET (fast path). 190 * Syscall return path ending with SYSRET (fast path).
@@ -279,19 +205,15 @@ system_call_fastpath:
279 * flags (TIF_NOTIFY_RESUME, TIF_USER_RETURN_NOTIFY, etc) set is 205 * flags (TIF_NOTIFY_RESUME, TIF_USER_RETURN_NOTIFY, etc) set is
280 * very bad. 206 * very bad.
281 */ 207 */
282 testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS) 208 testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
283 jnz int_ret_from_sys_call_irqs_off /* Go to the slow path */ 209 jnz int_ret_from_sys_call_irqs_off /* Go to the slow path */
284
285 CFI_REMEMBER_STATE
286 210
287 RESTORE_C_REGS_EXCEPT_RCX_R11 211 RESTORE_C_REGS_EXCEPT_RCX_R11
288 movq RIP(%rsp),%rcx 212 movq RIP(%rsp), %rcx
289 CFI_REGISTER rip,rcx 213 movq EFLAGS(%rsp), %r11
290 movq EFLAGS(%rsp),%r11 214 movq RSP(%rsp), %rsp
291 /*CFI_REGISTER rflags,r11*/
292 movq RSP(%rsp),%rsp
293 /* 215 /*
294 * 64bit SYSRET restores rip from rcx, 216 * 64-bit SYSRET restores rip from rcx,
295 * rflags from r11 (but RF and VM bits are forced to 0), 217 * rflags from r11 (but RF and VM bits are forced to 0),
296 * cs and ss are loaded from MSRs. 218 * cs and ss are loaded from MSRs.
297 * Restoration of rflags re-enables interrupts. 219 * Restoration of rflags re-enables interrupts.
@@ -307,25 +229,23 @@ system_call_fastpath:
307 */ 229 */
308 USERGS_SYSRET64 230 USERGS_SYSRET64
309 231
310 CFI_RESTORE_STATE
311
312 /* Do syscall entry tracing */ 232 /* Do syscall entry tracing */
313tracesys: 233tracesys:
314 movq %rsp, %rdi 234 movq %rsp, %rdi
315 movl $AUDIT_ARCH_X86_64, %esi 235 movl $AUDIT_ARCH_X86_64, %esi
316 call syscall_trace_enter_phase1 236 call syscall_trace_enter_phase1
317 test %rax, %rax 237 test %rax, %rax
318 jnz tracesys_phase2 /* if needed, run the slow path */ 238 jnz tracesys_phase2 /* if needed, run the slow path */
319 RESTORE_C_REGS_EXCEPT_RAX /* else restore clobbered regs */ 239 RESTORE_C_REGS_EXCEPT_RAX /* else restore clobbered regs */
320 movq ORIG_RAX(%rsp), %rax 240 movq ORIG_RAX(%rsp), %rax
321 jmp system_call_fastpath /* and return to the fast path */ 241 jmp entry_SYSCALL_64_fastpath /* and return to the fast path */
322 242
323tracesys_phase2: 243tracesys_phase2:
324 SAVE_EXTRA_REGS 244 SAVE_EXTRA_REGS
325 movq %rsp, %rdi 245 movq %rsp, %rdi
326 movl $AUDIT_ARCH_X86_64, %esi 246 movl $AUDIT_ARCH_X86_64, %esi
327 movq %rax,%rdx 247 movq %rax, %rdx
328 call syscall_trace_enter_phase2 248 call syscall_trace_enter_phase2
329 249
330 /* 250 /*
331 * Reload registers from stack in case ptrace changed them. 251 * Reload registers from stack in case ptrace changed them.
@@ -335,15 +255,15 @@ tracesys_phase2:
335 RESTORE_C_REGS_EXCEPT_RAX 255 RESTORE_C_REGS_EXCEPT_RAX
336 RESTORE_EXTRA_REGS 256 RESTORE_EXTRA_REGS
337#if __SYSCALL_MASK == ~0 257#if __SYSCALL_MASK == ~0
338 cmpq $__NR_syscall_max,%rax 258 cmpq $__NR_syscall_max, %rax
339#else 259#else
340 andl $__SYSCALL_MASK,%eax 260 andl $__SYSCALL_MASK, %eax
341 cmpl $__NR_syscall_max,%eax 261 cmpl $__NR_syscall_max, %eax
342#endif 262#endif
343 ja 1f /* return -ENOSYS (already in pt_regs->ax) */ 263 ja 1f /* return -ENOSYS (already in pt_regs->ax) */
344 movq %r10,%rcx /* fixup for C */ 264 movq %r10, %rcx /* fixup for C */
345 call *sys_call_table(,%rax,8) 265 call *sys_call_table(, %rax, 8)
346 movq %rax,RAX(%rsp) 266 movq %rax, RAX(%rsp)
3471: 2671:
348 /* Use IRET because user could have changed pt_regs->foo */ 268 /* Use IRET because user could have changed pt_regs->foo */
349 269
@@ -355,31 +275,33 @@ GLOBAL(int_ret_from_sys_call)
355 DISABLE_INTERRUPTS(CLBR_NONE) 275 DISABLE_INTERRUPTS(CLBR_NONE)
356int_ret_from_sys_call_irqs_off: /* jumps come here from the irqs-off SYSRET path */ 276int_ret_from_sys_call_irqs_off: /* jumps come here from the irqs-off SYSRET path */
357 TRACE_IRQS_OFF 277 TRACE_IRQS_OFF
358 movl $_TIF_ALLWORK_MASK,%edi 278 movl $_TIF_ALLWORK_MASK, %edi
359 /* edi: mask to check */ 279 /* edi: mask to check */
360GLOBAL(int_with_check) 280GLOBAL(int_with_check)
361 LOCKDEP_SYS_EXIT_IRQ 281 LOCKDEP_SYS_EXIT_IRQ
362 GET_THREAD_INFO(%rcx) 282 GET_THREAD_INFO(%rcx)
363 movl TI_flags(%rcx),%edx 283 movl TI_flags(%rcx), %edx
364 andl %edi,%edx 284 andl %edi, %edx
365 jnz int_careful 285 jnz int_careful
366 andl $~TS_COMPAT,TI_status(%rcx) 286 andl $~TS_COMPAT, TI_status(%rcx)
367 jmp syscall_return 287 jmp syscall_return
368 288
369 /* Either reschedule or signal or syscall exit tracking needed. */ 289 /*
370 /* First do a reschedule test. */ 290 * Either reschedule or signal or syscall exit tracking needed.
371 /* edx: work, edi: workmask */ 291 * First do a reschedule test.
292 * edx: work, edi: workmask
293 */
372int_careful: 294int_careful:
373 bt $TIF_NEED_RESCHED,%edx 295 bt $TIF_NEED_RESCHED, %edx
374 jnc int_very_careful 296 jnc int_very_careful
375 TRACE_IRQS_ON 297 TRACE_IRQS_ON
376 ENABLE_INTERRUPTS(CLBR_NONE) 298 ENABLE_INTERRUPTS(CLBR_NONE)
377 pushq_cfi %rdi 299 pushq %rdi
378 SCHEDULE_USER 300 SCHEDULE_USER
379 popq_cfi %rdi 301 popq %rdi
380 DISABLE_INTERRUPTS(CLBR_NONE) 302 DISABLE_INTERRUPTS(CLBR_NONE)
381 TRACE_IRQS_OFF 303 TRACE_IRQS_OFF
382 jmp int_with_check 304 jmp int_with_check
383 305
384 /* handle signals and tracing -- both require a full pt_regs */ 306 /* handle signals and tracing -- both require a full pt_regs */
385int_very_careful: 307int_very_careful:
@@ -387,27 +309,27 @@ int_very_careful:
387 ENABLE_INTERRUPTS(CLBR_NONE) 309 ENABLE_INTERRUPTS(CLBR_NONE)
388 SAVE_EXTRA_REGS 310 SAVE_EXTRA_REGS
389 /* Check for syscall exit trace */ 311 /* Check for syscall exit trace */
390 testl $_TIF_WORK_SYSCALL_EXIT,%edx 312 testl $_TIF_WORK_SYSCALL_EXIT, %edx
391 jz int_signal 313 jz int_signal
392 pushq_cfi %rdi 314 pushq %rdi
393 leaq 8(%rsp),%rdi # &ptregs -> arg1 315 leaq 8(%rsp), %rdi /* &ptregs -> arg1 */
394 call syscall_trace_leave 316 call syscall_trace_leave
395 popq_cfi %rdi 317 popq %rdi
396 andl $~(_TIF_WORK_SYSCALL_EXIT|_TIF_SYSCALL_EMU),%edi 318 andl $~(_TIF_WORK_SYSCALL_EXIT|_TIF_SYSCALL_EMU), %edi
397 jmp int_restore_rest 319 jmp int_restore_rest
398 320
399int_signal: 321int_signal:
400 testl $_TIF_DO_NOTIFY_MASK,%edx 322 testl $_TIF_DO_NOTIFY_MASK, %edx
401 jz 1f 323 jz 1f
402 movq %rsp,%rdi # &ptregs -> arg1 324 movq %rsp, %rdi /* &ptregs -> arg1 */
403 xorl %esi,%esi # oldset -> arg2 325 xorl %esi, %esi /* oldset -> arg2 */
404 call do_notify_resume 326 call do_notify_resume
4051: movl $_TIF_WORK_MASK,%edi 3271: movl $_TIF_WORK_MASK, %edi
406int_restore_rest: 328int_restore_rest:
407 RESTORE_EXTRA_REGS 329 RESTORE_EXTRA_REGS
408 DISABLE_INTERRUPTS(CLBR_NONE) 330 DISABLE_INTERRUPTS(CLBR_NONE)
409 TRACE_IRQS_OFF 331 TRACE_IRQS_OFF
410 jmp int_with_check 332 jmp int_with_check
411 333
412syscall_return: 334syscall_return:
413 /* The IRETQ could re-enable interrupts: */ 335 /* The IRETQ could re-enable interrupts: */
@@ -418,34 +340,37 @@ syscall_return:
418 * Try to use SYSRET instead of IRET if we're returning to 340 * Try to use SYSRET instead of IRET if we're returning to
419 * a completely clean 64-bit userspace context. 341 * a completely clean 64-bit userspace context.
420 */ 342 */
421 movq RCX(%rsp),%rcx 343 movq RCX(%rsp), %rcx
422 cmpq %rcx,RIP(%rsp) /* RCX == RIP */ 344 movq RIP(%rsp), %r11
423 jne opportunistic_sysret_failed 345 cmpq %rcx, %r11 /* RCX == RIP */
346 jne opportunistic_sysret_failed
424 347
425 /* 348 /*
426 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP 349 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
427 * in kernel space. This essentially lets the user take over 350 * in kernel space. This essentially lets the user take over
428 * the kernel, since userspace controls RSP. It's not worth 351 * the kernel, since userspace controls RSP.
429 * testing for canonicalness exactly -- this check detects any
430 * of the 17 high bits set, which is true for non-canonical
431 * or kernel addresses. (This will pessimize vsyscall=native.
432 * Big deal.)
433 * 352 *
434 * If virtual addresses ever become wider, this will need 353 * If width of "canonical tail" ever becomes variable, this will need
435 * to be updated to remain correct on both old and new CPUs. 354 * to be updated to remain correct on both old and new CPUs.
436 */ 355 */
437 .ifne __VIRTUAL_MASK_SHIFT - 47 356 .ifne __VIRTUAL_MASK_SHIFT - 47
438 .error "virtual address width changed -- SYSRET checks need update" 357 .error "virtual address width changed -- SYSRET checks need update"
439 .endif 358 .endif
440 shr $__VIRTUAL_MASK_SHIFT, %rcx
441 jnz opportunistic_sysret_failed
442 359
443 cmpq $__USER_CS,CS(%rsp) /* CS must match SYSRET */ 360 /* Change top 16 bits to be the sign-extension of 47th bit */
444 jne opportunistic_sysret_failed 361 shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
362 sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
445 363
446 movq R11(%rsp),%r11 364 /* If this changed %rcx, it was not canonical */
447 cmpq %r11,EFLAGS(%rsp) /* R11 == RFLAGS */ 365 cmpq %rcx, %r11
448 jne opportunistic_sysret_failed 366 jne opportunistic_sysret_failed
367
368 cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
369 jne opportunistic_sysret_failed
370
371 movq R11(%rsp), %r11
372 cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
373 jne opportunistic_sysret_failed
449 374
450 /* 375 /*
451 * SYSRET can't restore RF. SYSRET can restore TF, but unlike IRET, 376 * SYSRET can't restore RF. SYSRET can restore TF, but unlike IRET,
@@ -454,47 +379,41 @@ syscall_return:
454 * with register state that satisfies the opportunistic SYSRET 379 * with register state that satisfies the opportunistic SYSRET
455 * conditions. For example, single-stepping this user code: 380 * conditions. For example, single-stepping this user code:
456 * 381 *
457 * movq $stuck_here,%rcx 382 * movq $stuck_here, %rcx
458 * pushfq 383 * pushfq
459 * popq %r11 384 * popq %r11
460 * stuck_here: 385 * stuck_here:
461 * 386 *
462 * would never get past 'stuck_here'. 387 * would never get past 'stuck_here'.
463 */ 388 */
464 testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11 389 testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
465 jnz opportunistic_sysret_failed 390 jnz opportunistic_sysret_failed
466 391
467 /* nothing to check for RSP */ 392 /* nothing to check for RSP */
468 393
469 cmpq $__USER_DS,SS(%rsp) /* SS must match SYSRET */ 394 cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
470 jne opportunistic_sysret_failed 395 jne opportunistic_sysret_failed
471 396
472 /* 397 /*
473 * We win! This label is here just for ease of understanding 398 * We win! This label is here just for ease of understanding
474 * perf profiles. Nothing jumps here. 399 * perf profiles. Nothing jumps here.
475 */ 400 */
476syscall_return_via_sysret: 401syscall_return_via_sysret:
477 CFI_REMEMBER_STATE 402 /* rcx and r11 are already restored (see code above) */
478 /* r11 is already restored (see code above) */ 403 RESTORE_C_REGS_EXCEPT_RCX_R11
479 RESTORE_C_REGS_EXCEPT_R11 404 movq RSP(%rsp), %rsp
480 movq RSP(%rsp),%rsp
481 USERGS_SYSRET64 405 USERGS_SYSRET64
482 CFI_RESTORE_STATE
483 406
484opportunistic_sysret_failed: 407opportunistic_sysret_failed:
485 SWAPGS 408 SWAPGS
486 jmp restore_c_regs_and_iret 409 jmp restore_c_regs_and_iret
487 CFI_ENDPROC 410END(entry_SYSCALL_64)
488END(system_call)
489 411
490 412
491 .macro FORK_LIKE func 413 .macro FORK_LIKE func
492ENTRY(stub_\func) 414ENTRY(stub_\func)
493 CFI_STARTPROC
494 DEFAULT_FRAME 0, 8 /* offset 8: return address */
495 SAVE_EXTRA_REGS 8 415 SAVE_EXTRA_REGS 8
496 jmp sys_\func 416 jmp sys_\func
497 CFI_ENDPROC
498END(stub_\func) 417END(stub_\func)
499 .endm 418 .endm
500 419
@@ -503,8 +422,6 @@ END(stub_\func)
503 FORK_LIKE vfork 422 FORK_LIKE vfork
504 423
505ENTRY(stub_execve) 424ENTRY(stub_execve)
506 CFI_STARTPROC
507 DEFAULT_FRAME 0, 8
508 call sys_execve 425 call sys_execve
509return_from_execve: 426return_from_execve:
510 testl %eax, %eax 427 testl %eax, %eax
@@ -514,11 +431,9 @@ return_from_execve:
5141: 4311:
515 /* must use IRET code path (pt_regs->cs may have changed) */ 432 /* must use IRET code path (pt_regs->cs may have changed) */
516 addq $8, %rsp 433 addq $8, %rsp
517 CFI_ADJUST_CFA_OFFSET -8
518 ZERO_EXTRA_REGS 434 ZERO_EXTRA_REGS
519 movq %rax,RAX(%rsp) 435 movq %rax, RAX(%rsp)
520 jmp int_ret_from_sys_call 436 jmp int_ret_from_sys_call
521 CFI_ENDPROC
522END(stub_execve) 437END(stub_execve)
523/* 438/*
524 * Remaining execve stubs are only 7 bytes long. 439 * Remaining execve stubs are only 7 bytes long.
@@ -526,47 +441,25 @@ END(stub_execve)
526 */ 441 */
527 .align 8 442 .align 8
528GLOBAL(stub_execveat) 443GLOBAL(stub_execveat)
529 CFI_STARTPROC
530 DEFAULT_FRAME 0, 8
531 call sys_execveat 444 call sys_execveat
532 jmp return_from_execve 445 jmp return_from_execve
533 CFI_ENDPROC
534END(stub_execveat) 446END(stub_execveat)
535 447
536#ifdef CONFIG_X86_X32_ABI 448#if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION)
537 .align 8 449 .align 8
538GLOBAL(stub_x32_execve) 450GLOBAL(stub_x32_execve)
539 CFI_STARTPROC
540 DEFAULT_FRAME 0, 8
541 call compat_sys_execve
542 jmp return_from_execve
543 CFI_ENDPROC
544END(stub_x32_execve)
545 .align 8
546GLOBAL(stub_x32_execveat)
547 CFI_STARTPROC
548 DEFAULT_FRAME 0, 8
549 call compat_sys_execveat
550 jmp return_from_execve
551 CFI_ENDPROC
552END(stub_x32_execveat)
553#endif
554
555#ifdef CONFIG_IA32_EMULATION
556 .align 8
557GLOBAL(stub32_execve) 451GLOBAL(stub32_execve)
558 CFI_STARTPROC
559 call compat_sys_execve 452 call compat_sys_execve
560 jmp return_from_execve 453 jmp return_from_execve
561 CFI_ENDPROC
562END(stub32_execve) 454END(stub32_execve)
455END(stub_x32_execve)
563 .align 8 456 .align 8
457GLOBAL(stub_x32_execveat)
564GLOBAL(stub32_execveat) 458GLOBAL(stub32_execveat)
565 CFI_STARTPROC
566 call compat_sys_execveat 459 call compat_sys_execveat
567 jmp return_from_execve 460 jmp return_from_execve
568 CFI_ENDPROC
569END(stub32_execveat) 461END(stub32_execveat)
462END(stub_x32_execveat)
570#endif 463#endif
571 464
572/* 465/*
@@ -574,8 +467,6 @@ END(stub32_execveat)
574 * This cannot be done with SYSRET, so use the IRET return path instead. 467 * This cannot be done with SYSRET, so use the IRET return path instead.
575 */ 468 */
576ENTRY(stub_rt_sigreturn) 469ENTRY(stub_rt_sigreturn)
577 CFI_STARTPROC
578 DEFAULT_FRAME 0, 8
579 /* 470 /*
580 * SAVE_EXTRA_REGS result is not normally needed: 471 * SAVE_EXTRA_REGS result is not normally needed:
581 * sigreturn overwrites all pt_regs->GPREGS. 472 * sigreturn overwrites all pt_regs->GPREGS.
@@ -584,24 +475,19 @@ ENTRY(stub_rt_sigreturn)
584 * we SAVE_EXTRA_REGS here. 475 * we SAVE_EXTRA_REGS here.
585 */ 476 */
586 SAVE_EXTRA_REGS 8 477 SAVE_EXTRA_REGS 8
587 call sys_rt_sigreturn 478 call sys_rt_sigreturn
588return_from_stub: 479return_from_stub:
589 addq $8, %rsp 480 addq $8, %rsp
590 CFI_ADJUST_CFA_OFFSET -8
591 RESTORE_EXTRA_REGS 481 RESTORE_EXTRA_REGS
592 movq %rax,RAX(%rsp) 482 movq %rax, RAX(%rsp)
593 jmp int_ret_from_sys_call 483 jmp int_ret_from_sys_call
594 CFI_ENDPROC
595END(stub_rt_sigreturn) 484END(stub_rt_sigreturn)
596 485
597#ifdef CONFIG_X86_X32_ABI 486#ifdef CONFIG_X86_X32_ABI
598ENTRY(stub_x32_rt_sigreturn) 487ENTRY(stub_x32_rt_sigreturn)
599 CFI_STARTPROC
600 DEFAULT_FRAME 0, 8
601 SAVE_EXTRA_REGS 8 488 SAVE_EXTRA_REGS 8
602 call sys32_x32_rt_sigreturn 489 call sys32_x32_rt_sigreturn
603 jmp return_from_stub 490 jmp return_from_stub
604 CFI_ENDPROC
605END(stub_x32_rt_sigreturn) 491END(stub_x32_rt_sigreturn)
606#endif 492#endif
607 493
@@ -611,36 +497,36 @@ END(stub_x32_rt_sigreturn)
611 * rdi: prev task we switched from 497 * rdi: prev task we switched from
612 */ 498 */
613ENTRY(ret_from_fork) 499ENTRY(ret_from_fork)
614 DEFAULT_FRAME
615 500
616 LOCK ; btr $TIF_FORK,TI_flags(%r8) 501 LOCK ; btr $TIF_FORK, TI_flags(%r8)
617 502
618 pushq_cfi $0x0002 503 pushq $0x0002
619 popfq_cfi # reset kernel eflags 504 popfq /* reset kernel eflags */
620 505
621 call schedule_tail # rdi: 'prev' task parameter 506 call schedule_tail /* rdi: 'prev' task parameter */
622 507
623 RESTORE_EXTRA_REGS 508 RESTORE_EXTRA_REGS
624 509
625 testl $3,CS(%rsp) # from kernel_thread? 510 testb $3, CS(%rsp) /* from kernel_thread? */
626 511
627 /* 512 /*
628 * By the time we get here, we have no idea whether our pt_regs, 513 * By the time we get here, we have no idea whether our pt_regs,
629 * ti flags, and ti status came from the 64-bit SYSCALL fast path, 514 * ti flags, and ti status came from the 64-bit SYSCALL fast path,
630 * the slow path, or one of the ia32entry paths. 515 * the slow path, or one of the 32-bit compat paths.
631 * Use IRET code path to return, since it can safely handle 516 * Use IRET code path to return, since it can safely handle
632 * all of the above. 517 * all of the above.
633 */ 518 */
634 jnz int_ret_from_sys_call 519 jnz int_ret_from_sys_call
635 520
636 /* We came from kernel_thread */ 521 /*
637 /* nb: we depend on RESTORE_EXTRA_REGS above */ 522 * We came from kernel_thread
638 movq %rbp, %rdi 523 * nb: we depend on RESTORE_EXTRA_REGS above
639 call *%rbx 524 */
640 movl $0, RAX(%rsp) 525 movq %rbp, %rdi
526 call *%rbx
527 movl $0, RAX(%rsp)
641 RESTORE_EXTRA_REGS 528 RESTORE_EXTRA_REGS
642 jmp int_ret_from_sys_call 529 jmp int_ret_from_sys_call
643 CFI_ENDPROC
644END(ret_from_fork) 530END(ret_from_fork)
645 531
646/* 532/*
@@ -649,16 +535,13 @@ END(ret_from_fork)
649 */ 535 */
650 .align 8 536 .align 8
651ENTRY(irq_entries_start) 537ENTRY(irq_entries_start)
652 INTR_FRAME
653 vector=FIRST_EXTERNAL_VECTOR 538 vector=FIRST_EXTERNAL_VECTOR
654 .rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR) 539 .rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
655 pushq_cfi $(~vector+0x80) /* Note: always in signed byte range */ 540 pushq $(~vector+0x80) /* Note: always in signed byte range */
656 vector=vector+1 541 vector=vector+1
657 jmp common_interrupt 542 jmp common_interrupt
658 CFI_ADJUST_CFA_OFFSET -8
659 .align 8 543 .align 8
660 .endr 544 .endr
661 CFI_ENDPROC
662END(irq_entries_start) 545END(irq_entries_start)
663 546
664/* 547/*
@@ -684,10 +567,10 @@ END(irq_entries_start)
684 /* this goes to 0(%rsp) for unwinder, not for saving the value: */ 567 /* this goes to 0(%rsp) for unwinder, not for saving the value: */
685 SAVE_EXTRA_REGS_RBP -RBP 568 SAVE_EXTRA_REGS_RBP -RBP
686 569
687 leaq -RBP(%rsp),%rdi /* arg1 for \func (pointer to pt_regs) */ 570 leaq -RBP(%rsp), %rdi /* arg1 for \func (pointer to pt_regs) */
688 571
689 testl $3, CS-RBP(%rsp) 572 testb $3, CS-RBP(%rsp)
690 je 1f 573 jz 1f
691 SWAPGS 574 SWAPGS
6921: 5751:
693 /* 576 /*
@@ -697,24 +580,14 @@ END(irq_entries_start)
697 * a little cheaper to use a separate counter in the PDA (short of 580 * a little cheaper to use a separate counter in the PDA (short of
698 * moving irq_enter into assembly, which would be too much work) 581 * moving irq_enter into assembly, which would be too much work)
699 */ 582 */
700 movq %rsp, %rsi 583 movq %rsp, %rsi
701 incl PER_CPU_VAR(irq_count) 584 incl PER_CPU_VAR(irq_count)
702 cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp 585 cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp
703 CFI_DEF_CFA_REGISTER rsi 586 pushq %rsi
704 pushq %rsi
705 /*
706 * For debugger:
707 * "CFA (Current Frame Address) is the value on stack + offset"
708 */
709 CFI_ESCAPE 0x0f /* DW_CFA_def_cfa_expression */, 6, \
710 0x77 /* DW_OP_breg7 (rsp) */, 0, \
711 0x06 /* DW_OP_deref */, \
712 0x08 /* DW_OP_const1u */, SIZEOF_PTREGS-RBP, \
713 0x22 /* DW_OP_plus */
714 /* We entered an interrupt context - irqs are off: */ 587 /* We entered an interrupt context - irqs are off: */
715 TRACE_IRQS_OFF 588 TRACE_IRQS_OFF
716 589
717 call \func 590 call \func
718 .endm 591 .endm
719 592
720 /* 593 /*
@@ -723,42 +596,36 @@ END(irq_entries_start)
723 */ 596 */
724 .p2align CONFIG_X86_L1_CACHE_SHIFT 597 .p2align CONFIG_X86_L1_CACHE_SHIFT
725common_interrupt: 598common_interrupt:
726 XCPT_FRAME
727 ASM_CLAC 599 ASM_CLAC
728 addq $-0x80,(%rsp) /* Adjust vector to [-256,-1] range */ 600 addq $-0x80, (%rsp) /* Adjust vector to [-256, -1] range */
729 interrupt do_IRQ 601 interrupt do_IRQ
730 /* 0(%rsp): old RSP */ 602 /* 0(%rsp): old RSP */
731ret_from_intr: 603ret_from_intr:
732 DISABLE_INTERRUPTS(CLBR_NONE) 604 DISABLE_INTERRUPTS(CLBR_NONE)
733 TRACE_IRQS_OFF 605 TRACE_IRQS_OFF
734 decl PER_CPU_VAR(irq_count) 606 decl PER_CPU_VAR(irq_count)
735 607
736 /* Restore saved previous stack */ 608 /* Restore saved previous stack */
737 popq %rsi 609 popq %rsi
738 CFI_DEF_CFA rsi,SIZEOF_PTREGS-RBP /* reg/off reset after def_cfa_expr */
739 /* return code expects complete pt_regs - adjust rsp accordingly: */ 610 /* return code expects complete pt_regs - adjust rsp accordingly: */
740 leaq -RBP(%rsi),%rsp 611 leaq -RBP(%rsi), %rsp
741 CFI_DEF_CFA_REGISTER rsp
742 CFI_ADJUST_CFA_OFFSET RBP
743 612
744 testl $3,CS(%rsp) 613 testb $3, CS(%rsp)
745 je retint_kernel 614 jz retint_kernel
746 /* Interrupt came from user space */ 615 /* Interrupt came from user space */
747 616retint_user:
748 GET_THREAD_INFO(%rcx) 617 GET_THREAD_INFO(%rcx)
749 /* 618
750 * %rcx: thread info. Interrupts off. 619 /* %rcx: thread info. Interrupts are off. */
751 */
752retint_with_reschedule: 620retint_with_reschedule:
753 movl $_TIF_WORK_MASK,%edi 621 movl $_TIF_WORK_MASK, %edi
754retint_check: 622retint_check:
755 LOCKDEP_SYS_EXIT_IRQ 623 LOCKDEP_SYS_EXIT_IRQ
756 movl TI_flags(%rcx),%edx 624 movl TI_flags(%rcx), %edx
757 andl %edi,%edx 625 andl %edi, %edx
758 CFI_REMEMBER_STATE 626 jnz retint_careful
759 jnz retint_careful
760 627
761retint_swapgs: /* return to user-space */ 628retint_swapgs: /* return to user-space */
762 /* 629 /*
763 * The iretq could re-enable interrupts: 630 * The iretq could re-enable interrupts:
764 */ 631 */
@@ -773,9 +640,9 @@ retint_kernel:
773#ifdef CONFIG_PREEMPT 640#ifdef CONFIG_PREEMPT
774 /* Interrupts are off */ 641 /* Interrupts are off */
775 /* Check if we need preemption */ 642 /* Check if we need preemption */
776 bt $9,EFLAGS(%rsp) /* interrupts were off? */ 643 bt $9, EFLAGS(%rsp) /* were interrupts off? */
777 jnc 1f 644 jnc 1f
7780: cmpl $0,PER_CPU_VAR(__preempt_count) 6450: cmpl $0, PER_CPU_VAR(__preempt_count)
779 jnz 1f 646 jnz 1f
780 call preempt_schedule_irq 647 call preempt_schedule_irq
781 jmp 0b 648 jmp 0b
@@ -793,8 +660,6 @@ retint_kernel:
793restore_c_regs_and_iret: 660restore_c_regs_and_iret:
794 RESTORE_C_REGS 661 RESTORE_C_REGS
795 REMOVE_PT_GPREGS_FROM_STACK 8 662 REMOVE_PT_GPREGS_FROM_STACK 8
796
797irq_return:
798 INTERRUPT_RETURN 663 INTERRUPT_RETURN
799 664
800ENTRY(native_iret) 665ENTRY(native_iret)
@@ -803,8 +668,8 @@ ENTRY(native_iret)
803 * 64-bit mode SS:RSP on the exception stack is always valid. 668 * 64-bit mode SS:RSP on the exception stack is always valid.
804 */ 669 */
805#ifdef CONFIG_X86_ESPFIX64 670#ifdef CONFIG_X86_ESPFIX64
806 testb $4,(SS-RIP)(%rsp) 671 testb $4, (SS-RIP)(%rsp)
807 jnz native_irq_return_ldt 672 jnz native_irq_return_ldt
808#endif 673#endif
809 674
810.global native_irq_return_iret 675.global native_irq_return_iret
@@ -819,62 +684,60 @@ native_irq_return_iret:
819 684
820#ifdef CONFIG_X86_ESPFIX64 685#ifdef CONFIG_X86_ESPFIX64
821native_irq_return_ldt: 686native_irq_return_ldt:
822 pushq_cfi %rax 687 pushq %rax
823 pushq_cfi %rdi 688 pushq %rdi
824 SWAPGS 689 SWAPGS
825 movq PER_CPU_VAR(espfix_waddr),%rdi 690 movq PER_CPU_VAR(espfix_waddr), %rdi
826 movq %rax,(0*8)(%rdi) /* RAX */ 691 movq %rax, (0*8)(%rdi) /* RAX */
827 movq (2*8)(%rsp),%rax /* RIP */ 692 movq (2*8)(%rsp), %rax /* RIP */
828 movq %rax,(1*8)(%rdi) 693 movq %rax, (1*8)(%rdi)
829 movq (3*8)(%rsp),%rax /* CS */ 694 movq (3*8)(%rsp), %rax /* CS */
830 movq %rax,(2*8)(%rdi) 695 movq %rax, (2*8)(%rdi)
831 movq (4*8)(%rsp),%rax /* RFLAGS */ 696 movq (4*8)(%rsp), %rax /* RFLAGS */
832 movq %rax,(3*8)(%rdi) 697 movq %rax, (3*8)(%rdi)
833 movq (6*8)(%rsp),%rax /* SS */ 698 movq (6*8)(%rsp), %rax /* SS */
834 movq %rax,(5*8)(%rdi) 699 movq %rax, (5*8)(%rdi)
835 movq (5*8)(%rsp),%rax /* RSP */ 700 movq (5*8)(%rsp), %rax /* RSP */
836 movq %rax,(4*8)(%rdi) 701 movq %rax, (4*8)(%rdi)
837 andl $0xffff0000,%eax 702 andl $0xffff0000, %eax
838 popq_cfi %rdi 703 popq %rdi
839 orq PER_CPU_VAR(espfix_stack),%rax 704 orq PER_CPU_VAR(espfix_stack), %rax
840 SWAPGS 705 SWAPGS
841 movq %rax,%rsp 706 movq %rax, %rsp
842 popq_cfi %rax 707 popq %rax
843 jmp native_irq_return_iret 708 jmp native_irq_return_iret
844#endif 709#endif
845 710
846 /* edi: workmask, edx: work */ 711 /* edi: workmask, edx: work */
847retint_careful: 712retint_careful:
848 CFI_RESTORE_STATE 713 bt $TIF_NEED_RESCHED, %edx
849 bt $TIF_NEED_RESCHED,%edx 714 jnc retint_signal
850 jnc retint_signal
851 TRACE_IRQS_ON 715 TRACE_IRQS_ON
852 ENABLE_INTERRUPTS(CLBR_NONE) 716 ENABLE_INTERRUPTS(CLBR_NONE)
853 pushq_cfi %rdi 717 pushq %rdi
854 SCHEDULE_USER 718 SCHEDULE_USER
855 popq_cfi %rdi 719 popq %rdi
856 GET_THREAD_INFO(%rcx) 720 GET_THREAD_INFO(%rcx)
857 DISABLE_INTERRUPTS(CLBR_NONE) 721 DISABLE_INTERRUPTS(CLBR_NONE)
858 TRACE_IRQS_OFF 722 TRACE_IRQS_OFF
859 jmp retint_check 723 jmp retint_check
860 724
861retint_signal: 725retint_signal:
862 testl $_TIF_DO_NOTIFY_MASK,%edx 726 testl $_TIF_DO_NOTIFY_MASK, %edx
863 jz retint_swapgs 727 jz retint_swapgs
864 TRACE_IRQS_ON 728 TRACE_IRQS_ON
865 ENABLE_INTERRUPTS(CLBR_NONE) 729 ENABLE_INTERRUPTS(CLBR_NONE)
866 SAVE_EXTRA_REGS 730 SAVE_EXTRA_REGS
867 movq $-1,ORIG_RAX(%rsp) 731 movq $-1, ORIG_RAX(%rsp)
868 xorl %esi,%esi # oldset 732 xorl %esi, %esi /* oldset */
869 movq %rsp,%rdi # &pt_regs 733 movq %rsp, %rdi /* &pt_regs */
870 call do_notify_resume 734 call do_notify_resume
871 RESTORE_EXTRA_REGS 735 RESTORE_EXTRA_REGS
872 DISABLE_INTERRUPTS(CLBR_NONE) 736 DISABLE_INTERRUPTS(CLBR_NONE)
873 TRACE_IRQS_OFF 737 TRACE_IRQS_OFF
874 GET_THREAD_INFO(%rcx) 738 GET_THREAD_INFO(%rcx)
875 jmp retint_with_reschedule 739 jmp retint_with_reschedule
876 740
877 CFI_ENDPROC
878END(common_interrupt) 741END(common_interrupt)
879 742
880/* 743/*
@@ -882,13 +745,11 @@ END(common_interrupt)
882 */ 745 */
883.macro apicinterrupt3 num sym do_sym 746.macro apicinterrupt3 num sym do_sym
884ENTRY(\sym) 747ENTRY(\sym)
885 INTR_FRAME
886 ASM_CLAC 748 ASM_CLAC
887 pushq_cfi $~(\num) 749 pushq $~(\num)
888.Lcommon_\sym: 750.Lcommon_\sym:
889 interrupt \do_sym 751 interrupt \do_sym
890 jmp ret_from_intr 752 jmp ret_from_intr
891 CFI_ENDPROC
892END(\sym) 753END(\sym)
893.endm 754.endm
894 755
@@ -910,53 +771,45 @@ trace_apicinterrupt \num \sym
910.endm 771.endm
911 772
912#ifdef CONFIG_SMP 773#ifdef CONFIG_SMP
913apicinterrupt3 IRQ_MOVE_CLEANUP_VECTOR \ 774apicinterrupt3 IRQ_MOVE_CLEANUP_VECTOR irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt
914 irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt 775apicinterrupt3 REBOOT_VECTOR reboot_interrupt smp_reboot_interrupt
915apicinterrupt3 REBOOT_VECTOR \
916 reboot_interrupt smp_reboot_interrupt
917#endif 776#endif
918 777
919#ifdef CONFIG_X86_UV 778#ifdef CONFIG_X86_UV
920apicinterrupt3 UV_BAU_MESSAGE \ 779apicinterrupt3 UV_BAU_MESSAGE uv_bau_message_intr1 uv_bau_message_interrupt
921 uv_bau_message_intr1 uv_bau_message_interrupt
922#endif 780#endif
923apicinterrupt LOCAL_TIMER_VECTOR \ 781
924 apic_timer_interrupt smp_apic_timer_interrupt 782apicinterrupt LOCAL_TIMER_VECTOR apic_timer_interrupt smp_apic_timer_interrupt
925apicinterrupt X86_PLATFORM_IPI_VECTOR \ 783apicinterrupt X86_PLATFORM_IPI_VECTOR x86_platform_ipi smp_x86_platform_ipi
926 x86_platform_ipi smp_x86_platform_ipi
927 784
928#ifdef CONFIG_HAVE_KVM 785#ifdef CONFIG_HAVE_KVM
929apicinterrupt3 POSTED_INTR_VECTOR \ 786apicinterrupt3 POSTED_INTR_VECTOR kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
930 kvm_posted_intr_ipi smp_kvm_posted_intr_ipi 787apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
931#endif 788#endif
932 789
933#ifdef CONFIG_X86_MCE_THRESHOLD 790#ifdef CONFIG_X86_MCE_THRESHOLD
934apicinterrupt THRESHOLD_APIC_VECTOR \ 791apicinterrupt THRESHOLD_APIC_VECTOR threshold_interrupt smp_threshold_interrupt
935 threshold_interrupt smp_threshold_interrupt 792#endif
793
794#ifdef CONFIG_X86_MCE_AMD
795apicinterrupt DEFERRED_ERROR_VECTOR deferred_error_interrupt smp_deferred_error_interrupt
936#endif 796#endif
937 797
938#ifdef CONFIG_X86_THERMAL_VECTOR 798#ifdef CONFIG_X86_THERMAL_VECTOR
939apicinterrupt THERMAL_APIC_VECTOR \ 799apicinterrupt THERMAL_APIC_VECTOR thermal_interrupt smp_thermal_interrupt
940 thermal_interrupt smp_thermal_interrupt
941#endif 800#endif
942 801
943#ifdef CONFIG_SMP 802#ifdef CONFIG_SMP
944apicinterrupt CALL_FUNCTION_SINGLE_VECTOR \ 803apicinterrupt CALL_FUNCTION_SINGLE_VECTOR call_function_single_interrupt smp_call_function_single_interrupt
945 call_function_single_interrupt smp_call_function_single_interrupt 804apicinterrupt CALL_FUNCTION_VECTOR call_function_interrupt smp_call_function_interrupt
946apicinterrupt CALL_FUNCTION_VECTOR \ 805apicinterrupt RESCHEDULE_VECTOR reschedule_interrupt smp_reschedule_interrupt
947 call_function_interrupt smp_call_function_interrupt
948apicinterrupt RESCHEDULE_VECTOR \
949 reschedule_interrupt smp_reschedule_interrupt
950#endif 806#endif
951 807
952apicinterrupt ERROR_APIC_VECTOR \ 808apicinterrupt ERROR_APIC_VECTOR error_interrupt smp_error_interrupt
953 error_interrupt smp_error_interrupt 809apicinterrupt SPURIOUS_APIC_VECTOR spurious_interrupt smp_spurious_interrupt
954apicinterrupt SPURIOUS_APIC_VECTOR \
955 spurious_interrupt smp_spurious_interrupt
956 810
957#ifdef CONFIG_IRQ_WORK 811#ifdef CONFIG_IRQ_WORK
958apicinterrupt IRQ_WORK_VECTOR \ 812apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt
959 irq_work_interrupt smp_irq_work_interrupt
960#endif 813#endif
961 814
962/* 815/*
@@ -971,100 +824,87 @@ ENTRY(\sym)
971 .error "using shift_ist requires paranoid=1" 824 .error "using shift_ist requires paranoid=1"
972 .endif 825 .endif
973 826
974 .if \has_error_code
975 XCPT_FRAME
976 .else
977 INTR_FRAME
978 .endif
979
980 ASM_CLAC 827 ASM_CLAC
981 PARAVIRT_ADJUST_EXCEPTION_FRAME 828 PARAVIRT_ADJUST_EXCEPTION_FRAME
982 829
983 .ifeq \has_error_code 830 .ifeq \has_error_code
984 pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */ 831 pushq $-1 /* ORIG_RAX: no syscall to restart */
985 .endif 832 .endif
986 833
987 ALLOC_PT_GPREGS_ON_STACK 834 ALLOC_PT_GPREGS_ON_STACK
988 835
989 .if \paranoid 836 .if \paranoid
990 .if \paranoid == 1 837 .if \paranoid == 1
991 CFI_REMEMBER_STATE 838 testb $3, CS(%rsp) /* If coming from userspace, switch stacks */
992 testl $3, CS(%rsp) /* If coming from userspace, switch */ 839 jnz 1f
993 jnz 1f /* stacks. */
994 .endif 840 .endif
995 call paranoid_entry 841 call paranoid_entry
996 .else 842 .else
997 call error_entry 843 call error_entry
998 .endif 844 .endif
999 /* returned flag: ebx=0: need swapgs on exit, ebx=1: don't need it */ 845 /* returned flag: ebx=0: need swapgs on exit, ebx=1: don't need it */
1000 846
1001 DEFAULT_FRAME 0
1002
1003 .if \paranoid 847 .if \paranoid
1004 .if \shift_ist != -1 848 .if \shift_ist != -1
1005 TRACE_IRQS_OFF_DEBUG /* reload IDT in case of recursion */ 849 TRACE_IRQS_OFF_DEBUG /* reload IDT in case of recursion */
1006 .else 850 .else
1007 TRACE_IRQS_OFF 851 TRACE_IRQS_OFF
1008 .endif 852 .endif
1009 .endif 853 .endif
1010 854
1011 movq %rsp,%rdi /* pt_regs pointer */ 855 movq %rsp, %rdi /* pt_regs pointer */
1012 856
1013 .if \has_error_code 857 .if \has_error_code
1014 movq ORIG_RAX(%rsp),%rsi /* get error code */ 858 movq ORIG_RAX(%rsp), %rsi /* get error code */
1015 movq $-1,ORIG_RAX(%rsp) /* no syscall to restart */ 859 movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
1016 .else 860 .else
1017 xorl %esi,%esi /* no error code */ 861 xorl %esi, %esi /* no error code */
1018 .endif 862 .endif
1019 863
1020 .if \shift_ist != -1 864 .if \shift_ist != -1
1021 subq $EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist) 865 subq $EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
1022 .endif 866 .endif
1023 867
1024 call \do_sym 868 call \do_sym
1025 869
1026 .if \shift_ist != -1 870 .if \shift_ist != -1
1027 addq $EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist) 871 addq $EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
1028 .endif 872 .endif
1029 873
1030 /* these procedures expect "no swapgs" flag in ebx */ 874 /* these procedures expect "no swapgs" flag in ebx */
1031 .if \paranoid 875 .if \paranoid
1032 jmp paranoid_exit 876 jmp paranoid_exit
1033 .else 877 .else
1034 jmp error_exit 878 jmp error_exit
1035 .endif 879 .endif
1036 880
1037 .if \paranoid == 1 881 .if \paranoid == 1
1038 CFI_RESTORE_STATE
1039 /* 882 /*
1040 * Paranoid entry from userspace. Switch stacks and treat it 883 * Paranoid entry from userspace. Switch stacks and treat it
1041 * as a normal entry. This means that paranoid handlers 884 * as a normal entry. This means that paranoid handlers
1042 * run in real process context if user_mode(regs). 885 * run in real process context if user_mode(regs).
1043 */ 886 */
10441: 8871:
1045 call error_entry 888 call error_entry
1046 889
1047 DEFAULT_FRAME 0
1048 890
1049 movq %rsp,%rdi /* pt_regs pointer */ 891 movq %rsp, %rdi /* pt_regs pointer */
1050 call sync_regs 892 call sync_regs
1051 movq %rax,%rsp /* switch stack */ 893 movq %rax, %rsp /* switch stack */
1052 894
1053 movq %rsp,%rdi /* pt_regs pointer */ 895 movq %rsp, %rdi /* pt_regs pointer */
1054 896
1055 .if \has_error_code 897 .if \has_error_code
1056 movq ORIG_RAX(%rsp),%rsi /* get error code */ 898 movq ORIG_RAX(%rsp), %rsi /* get error code */
1057 movq $-1,ORIG_RAX(%rsp) /* no syscall to restart */ 899 movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
1058 .else 900 .else
1059 xorl %esi,%esi /* no error code */ 901 xorl %esi, %esi /* no error code */
1060 .endif 902 .endif
1061 903
1062 call \do_sym 904 call \do_sym
1063 905
1064 jmp error_exit /* %ebx: no swapgs flag */ 906 jmp error_exit /* %ebx: no swapgs flag */
1065 .endif 907 .endif
1066
1067 CFI_ENDPROC
1068END(\sym) 908END(\sym)
1069.endm 909.endm
1070 910
@@ -1079,65 +919,58 @@ idtentry \sym \do_sym has_error_code=\has_error_code
1079.endm 919.endm
1080#endif 920#endif
1081 921
1082idtentry divide_error do_divide_error has_error_code=0 922idtentry divide_error do_divide_error has_error_code=0
1083idtentry overflow do_overflow has_error_code=0 923idtentry overflow do_overflow has_error_code=0
1084idtentry bounds do_bounds has_error_code=0 924idtentry bounds do_bounds has_error_code=0
1085idtentry invalid_op do_invalid_op has_error_code=0 925idtentry invalid_op do_invalid_op has_error_code=0
1086idtentry device_not_available do_device_not_available has_error_code=0 926idtentry device_not_available do_device_not_available has_error_code=0
1087idtentry double_fault do_double_fault has_error_code=1 paranoid=2 927idtentry double_fault do_double_fault has_error_code=1 paranoid=2
1088idtentry coprocessor_segment_overrun do_coprocessor_segment_overrun has_error_code=0 928idtentry coprocessor_segment_overrun do_coprocessor_segment_overrun has_error_code=0
1089idtentry invalid_TSS do_invalid_TSS has_error_code=1 929idtentry invalid_TSS do_invalid_TSS has_error_code=1
1090idtentry segment_not_present do_segment_not_present has_error_code=1 930idtentry segment_not_present do_segment_not_present has_error_code=1
1091idtentry spurious_interrupt_bug do_spurious_interrupt_bug has_error_code=0 931idtentry spurious_interrupt_bug do_spurious_interrupt_bug has_error_code=0
1092idtentry coprocessor_error do_coprocessor_error has_error_code=0 932idtentry coprocessor_error do_coprocessor_error has_error_code=0
1093idtentry alignment_check do_alignment_check has_error_code=1 933idtentry alignment_check do_alignment_check has_error_code=1
1094idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0 934idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0
1095 935
1096 936
1097 /* Reload gs selector with exception handling */ 937 /*
1098 /* edi: new selector */ 938 * Reload gs selector with exception handling
939 * edi: new selector
940 */
1099ENTRY(native_load_gs_index) 941ENTRY(native_load_gs_index)
1100 CFI_STARTPROC 942 pushfq
1101 pushfq_cfi
1102 DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI) 943 DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
1103 SWAPGS 944 SWAPGS
1104gs_change: 945gs_change:
1105 movl %edi,%gs 946 movl %edi, %gs
11062: mfence /* workaround */ 9472: mfence /* workaround */
1107 SWAPGS 948 SWAPGS
1108 popfq_cfi 949 popfq
1109 ret 950 ret
1110 CFI_ENDPROC
1111END(native_load_gs_index) 951END(native_load_gs_index)
1112 952
1113 _ASM_EXTABLE(gs_change,bad_gs) 953 _ASM_EXTABLE(gs_change, bad_gs)
1114 .section .fixup,"ax" 954 .section .fixup, "ax"
1115 /* running with kernelgs */ 955 /* running with kernelgs */
1116bad_gs: 956bad_gs:
1117 SWAPGS /* switch back to user gs */ 957 SWAPGS /* switch back to user gs */
1118 xorl %eax,%eax 958 xorl %eax, %eax
1119 movl %eax,%gs 959 movl %eax, %gs
1120 jmp 2b 960 jmp 2b
1121 .previous 961 .previous
1122 962
1123/* Call softirq on interrupt stack. Interrupts are off. */ 963/* Call softirq on interrupt stack. Interrupts are off. */
1124ENTRY(do_softirq_own_stack) 964ENTRY(do_softirq_own_stack)
1125 CFI_STARTPROC 965 pushq %rbp
1126 pushq_cfi %rbp 966 mov %rsp, %rbp
1127 CFI_REL_OFFSET rbp,0 967 incl PER_CPU_VAR(irq_count)
1128 mov %rsp,%rbp 968 cmove PER_CPU_VAR(irq_stack_ptr), %rsp
1129 CFI_DEF_CFA_REGISTER rbp 969 push %rbp /* frame pointer backlink */
1130 incl PER_CPU_VAR(irq_count) 970 call __do_softirq
1131 cmove PER_CPU_VAR(irq_stack_ptr),%rsp
1132 push %rbp # backlink for old unwinder
1133 call __do_softirq
1134 leaveq 971 leaveq
1135 CFI_RESTORE rbp 972 decl PER_CPU_VAR(irq_count)
1136 CFI_DEF_CFA_REGISTER rsp
1137 CFI_ADJUST_CFA_OFFSET -8
1138 decl PER_CPU_VAR(irq_count)
1139 ret 973 ret
1140 CFI_ENDPROC
1141END(do_softirq_own_stack) 974END(do_softirq_own_stack)
1142 975
1143#ifdef CONFIG_XEN 976#ifdef CONFIG_XEN
@@ -1156,29 +989,24 @@ idtentry xen_hypervisor_callback xen_do_hypervisor_callback has_error_code=0
1156 * existing activation in its critical region -- if so, we pop the current 989 * existing activation in its critical region -- if so, we pop the current
1157 * activation and restart the handler using the previous one. 990 * activation and restart the handler using the previous one.
1158 */ 991 */
1159ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) 992ENTRY(xen_do_hypervisor_callback) /* do_hypervisor_callback(struct *pt_regs) */
1160 CFI_STARTPROC 993
1161/* 994/*
1162 * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will 995 * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
1163 * see the correct pointer to the pt_regs 996 * see the correct pointer to the pt_regs
1164 */ 997 */
1165 movq %rdi, %rsp # we don't return, adjust the stack frame 998 movq %rdi, %rsp /* we don't return, adjust the stack frame */
1166 CFI_ENDPROC 99911: incl PER_CPU_VAR(irq_count)
1167 DEFAULT_FRAME 1000 movq %rsp, %rbp
116811: incl PER_CPU_VAR(irq_count) 1001 cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp
1169 movq %rsp,%rbp 1002 pushq %rbp /* frame pointer backlink */
1170 CFI_DEF_CFA_REGISTER rbp 1003 call xen_evtchn_do_upcall
1171 cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp 1004 popq %rsp
1172 pushq %rbp # backlink for old unwinder 1005 decl PER_CPU_VAR(irq_count)
1173 call xen_evtchn_do_upcall
1174 popq %rsp
1175 CFI_DEF_CFA_REGISTER rsp
1176 decl PER_CPU_VAR(irq_count)
1177#ifndef CONFIG_PREEMPT 1006#ifndef CONFIG_PREEMPT
1178 call xen_maybe_preempt_hcall 1007 call xen_maybe_preempt_hcall
1179#endif 1008#endif
1180 jmp error_exit 1009 jmp error_exit
1181 CFI_ENDPROC
1182END(xen_do_hypervisor_callback) 1010END(xen_do_hypervisor_callback)
1183 1011
1184/* 1012/*
@@ -1195,51 +1023,35 @@ END(xen_do_hypervisor_callback)
1195 * with its current contents: any discrepancy means we in category 1. 1023 * with its current contents: any discrepancy means we in category 1.
1196 */ 1024 */
1197ENTRY(xen_failsafe_callback) 1025ENTRY(xen_failsafe_callback)
1198 INTR_FRAME 1 (6*8) 1026 movl %ds, %ecx
1199 /*CFI_REL_OFFSET gs,GS*/ 1027 cmpw %cx, 0x10(%rsp)
1200 /*CFI_REL_OFFSET fs,FS*/ 1028 jne 1f
1201 /*CFI_REL_OFFSET es,ES*/ 1029 movl %es, %ecx
1202 /*CFI_REL_OFFSET ds,DS*/ 1030 cmpw %cx, 0x18(%rsp)
1203 CFI_REL_OFFSET r11,8 1031 jne 1f
1204 CFI_REL_OFFSET rcx,0 1032 movl %fs, %ecx
1205 movw %ds,%cx 1033 cmpw %cx, 0x20(%rsp)
1206 cmpw %cx,0x10(%rsp) 1034 jne 1f
1207 CFI_REMEMBER_STATE 1035 movl %gs, %ecx
1208 jne 1f 1036 cmpw %cx, 0x28(%rsp)
1209 movw %es,%cx 1037 jne 1f
1210 cmpw %cx,0x18(%rsp)
1211 jne 1f
1212 movw %fs,%cx
1213 cmpw %cx,0x20(%rsp)
1214 jne 1f
1215 movw %gs,%cx
1216 cmpw %cx,0x28(%rsp)
1217 jne 1f
1218 /* All segments match their saved values => Category 2 (Bad IRET). */ 1038 /* All segments match their saved values => Category 2 (Bad IRET). */
1219 movq (%rsp),%rcx 1039 movq (%rsp), %rcx
1220 CFI_RESTORE rcx 1040 movq 8(%rsp), %r11
1221 movq 8(%rsp),%r11 1041 addq $0x30, %rsp
1222 CFI_RESTORE r11 1042 pushq $0 /* RIP */
1223 addq $0x30,%rsp 1043 pushq %r11
1224 CFI_ADJUST_CFA_OFFSET -0x30 1044 pushq %rcx
1225 pushq_cfi $0 /* RIP */ 1045 jmp general_protection
1226 pushq_cfi %r11
1227 pushq_cfi %rcx
1228 jmp general_protection
1229 CFI_RESTORE_STATE
12301: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */ 10461: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
1231 movq (%rsp),%rcx 1047 movq (%rsp), %rcx
1232 CFI_RESTORE rcx 1048 movq 8(%rsp), %r11
1233 movq 8(%rsp),%r11 1049 addq $0x30, %rsp
1234 CFI_RESTORE r11 1050 pushq $-1 /* orig_ax = -1 => not a system call */
1235 addq $0x30,%rsp
1236 CFI_ADJUST_CFA_OFFSET -0x30
1237 pushq_cfi $-1 /* orig_ax = -1 => not a system call */
1238 ALLOC_PT_GPREGS_ON_STACK 1051 ALLOC_PT_GPREGS_ON_STACK
1239 SAVE_C_REGS 1052 SAVE_C_REGS
1240 SAVE_EXTRA_REGS 1053 SAVE_EXTRA_REGS
1241 jmp error_exit 1054 jmp error_exit
1242 CFI_ENDPROC
1243END(xen_failsafe_callback) 1055END(xen_failsafe_callback)
1244 1056
1245apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ 1057apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
@@ -1252,21 +1064,25 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
1252 hyperv_callback_vector hyperv_vector_handler 1064 hyperv_callback_vector hyperv_vector_handler
1253#endif /* CONFIG_HYPERV */ 1065#endif /* CONFIG_HYPERV */
1254 1066
1255idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK 1067idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
1256idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK 1068idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
1257idtentry stack_segment do_stack_segment has_error_code=1 1069idtentry stack_segment do_stack_segment has_error_code=1
1070
1258#ifdef CONFIG_XEN 1071#ifdef CONFIG_XEN
1259idtentry xen_debug do_debug has_error_code=0 1072idtentry xen_debug do_debug has_error_code=0
1260idtentry xen_int3 do_int3 has_error_code=0 1073idtentry xen_int3 do_int3 has_error_code=0
1261idtentry xen_stack_segment do_stack_segment has_error_code=1 1074idtentry xen_stack_segment do_stack_segment has_error_code=1
1262#endif 1075#endif
1263idtentry general_protection do_general_protection has_error_code=1 1076
1264trace_idtentry page_fault do_page_fault has_error_code=1 1077idtentry general_protection do_general_protection has_error_code=1
1078trace_idtentry page_fault do_page_fault has_error_code=1
1079
1265#ifdef CONFIG_KVM_GUEST 1080#ifdef CONFIG_KVM_GUEST
1266idtentry async_page_fault do_async_page_fault has_error_code=1 1081idtentry async_page_fault do_async_page_fault has_error_code=1
1267#endif 1082#endif
1083
1268#ifdef CONFIG_X86_MCE 1084#ifdef CONFIG_X86_MCE
1269idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(%rip) 1085idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(%rip)
1270#endif 1086#endif
1271 1087
1272/* 1088/*
@@ -1275,19 +1091,17 @@ idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(
1275 * Return: ebx=0: need swapgs on exit, ebx=1: otherwise 1091 * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
1276 */ 1092 */
1277ENTRY(paranoid_entry) 1093ENTRY(paranoid_entry)
1278 XCPT_FRAME 1 15*8
1279 cld 1094 cld
1280 SAVE_C_REGS 8 1095 SAVE_C_REGS 8
1281 SAVE_EXTRA_REGS 8 1096 SAVE_EXTRA_REGS 8
1282 movl $1,%ebx 1097 movl $1, %ebx
1283 movl $MSR_GS_BASE,%ecx 1098 movl $MSR_GS_BASE, %ecx
1284 rdmsr 1099 rdmsr
1285 testl %edx,%edx 1100 testl %edx, %edx
1286 js 1f /* negative -> in kernel */ 1101 js 1f /* negative -> in kernel */
1287 SWAPGS 1102 SWAPGS
1288 xorl %ebx,%ebx 1103 xorl %ebx, %ebx
12891: ret 11041: ret
1290 CFI_ENDPROC
1291END(paranoid_entry) 1105END(paranoid_entry)
1292 1106
1293/* 1107/*
@@ -1299,17 +1113,17 @@ END(paranoid_entry)
1299 * in syscall entry), so checking for preemption here would 1113 * in syscall entry), so checking for preemption here would
1300 * be complicated. Fortunately, we there's no good reason 1114 * be complicated. Fortunately, we there's no good reason
1301 * to try to handle preemption here. 1115 * to try to handle preemption here.
1116 *
1117 * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
1302 */ 1118 */
1303/* On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) */
1304ENTRY(paranoid_exit) 1119ENTRY(paranoid_exit)
1305 DEFAULT_FRAME
1306 DISABLE_INTERRUPTS(CLBR_NONE) 1120 DISABLE_INTERRUPTS(CLBR_NONE)
1307 TRACE_IRQS_OFF_DEBUG 1121 TRACE_IRQS_OFF_DEBUG
1308 testl %ebx,%ebx /* swapgs needed? */ 1122 testl %ebx, %ebx /* swapgs needed? */
1309 jnz paranoid_exit_no_swapgs 1123 jnz paranoid_exit_no_swapgs
1310 TRACE_IRQS_IRETQ 1124 TRACE_IRQS_IRETQ
1311 SWAPGS_UNSAFE_STACK 1125 SWAPGS_UNSAFE_STACK
1312 jmp paranoid_exit_restore 1126 jmp paranoid_exit_restore
1313paranoid_exit_no_swapgs: 1127paranoid_exit_no_swapgs:
1314 TRACE_IRQS_IRETQ_DEBUG 1128 TRACE_IRQS_IRETQ_DEBUG
1315paranoid_exit_restore: 1129paranoid_exit_restore:
@@ -1317,24 +1131,24 @@ paranoid_exit_restore:
1317 RESTORE_C_REGS 1131 RESTORE_C_REGS
1318 REMOVE_PT_GPREGS_FROM_STACK 8 1132 REMOVE_PT_GPREGS_FROM_STACK 8
1319 INTERRUPT_RETURN 1133 INTERRUPT_RETURN
1320 CFI_ENDPROC
1321END(paranoid_exit) 1134END(paranoid_exit)
1322 1135
1323/* 1136/*
1324 * Save all registers in pt_regs, and switch gs if needed. 1137 * Save all registers in pt_regs, and switch gs if needed.
1325 * Return: ebx=0: need swapgs on exit, ebx=1: otherwise 1138 * Return: EBX=0: came from user mode; EBX=1: otherwise
1326 */ 1139 */
1327ENTRY(error_entry) 1140ENTRY(error_entry)
1328 XCPT_FRAME 1 15*8
1329 cld 1141 cld
1330 SAVE_C_REGS 8 1142 SAVE_C_REGS 8
1331 SAVE_EXTRA_REGS 8 1143 SAVE_EXTRA_REGS 8
1332 xorl %ebx,%ebx 1144 xorl %ebx, %ebx
1333 testl $3,CS+8(%rsp) 1145 testb $3, CS+8(%rsp)
1334 je error_kernelspace 1146 jz error_kernelspace
1335error_swapgs: 1147
1148 /* We entered from user mode */
1336 SWAPGS 1149 SWAPGS
1337error_sti: 1150
1151error_entry_done:
1338 TRACE_IRQS_OFF 1152 TRACE_IRQS_OFF
1339 ret 1153 ret
1340 1154
@@ -1345,56 +1159,66 @@ error_sti:
1345 * for these here too. 1159 * for these here too.
1346 */ 1160 */
1347error_kernelspace: 1161error_kernelspace:
1348 CFI_REL_OFFSET rcx, RCX+8 1162 incl %ebx
1349 incl %ebx 1163 leaq native_irq_return_iret(%rip), %rcx
1350 leaq native_irq_return_iret(%rip),%rcx 1164 cmpq %rcx, RIP+8(%rsp)
1351 cmpq %rcx,RIP+8(%rsp) 1165 je error_bad_iret
1352 je error_bad_iret 1166 movl %ecx, %eax /* zero extend */
1353 movl %ecx,%eax /* zero extend */ 1167 cmpq %rax, RIP+8(%rsp)
1354 cmpq %rax,RIP+8(%rsp) 1168 je bstep_iret
1355 je bstep_iret 1169 cmpq $gs_change, RIP+8(%rsp)
1356 cmpq $gs_change,RIP+8(%rsp) 1170 jne error_entry_done
1357 je error_swapgs 1171
1358 jmp error_sti 1172 /*
1173 * hack: gs_change can fail with user gsbase. If this happens, fix up
1174 * gsbase and proceed. We'll fix up the exception and land in
1175 * gs_change's error handler with kernel gsbase.
1176 */
1177 SWAPGS
1178 jmp error_entry_done
1359 1179
1360bstep_iret: 1180bstep_iret:
1361 /* Fix truncated RIP */ 1181 /* Fix truncated RIP */
1362 movq %rcx,RIP+8(%rsp) 1182 movq %rcx, RIP+8(%rsp)
1363 /* fall through */ 1183 /* fall through */
1364 1184
1365error_bad_iret: 1185error_bad_iret:
1186 /*
1187 * We came from an IRET to user mode, so we have user gsbase.
1188 * Switch to kernel gsbase:
1189 */
1366 SWAPGS 1190 SWAPGS
1367 mov %rsp,%rdi 1191
1368 call fixup_bad_iret 1192 /*
1369 mov %rax,%rsp 1193 * Pretend that the exception came from user mode: set up pt_regs
1370 decl %ebx /* Return to usergs */ 1194 * as if we faulted immediately after IRET and clear EBX so that
1371 jmp error_sti 1195 * error_exit knows that we will be returning to user mode.
1372 CFI_ENDPROC 1196 */
1197 mov %rsp, %rdi
1198 call fixup_bad_iret
1199 mov %rax, %rsp
1200 decl %ebx
1201 jmp error_entry_done
1373END(error_entry) 1202END(error_entry)
1374 1203
1375 1204
1376/* On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) */ 1205/*
1206 * On entry, EBS is a "return to kernel mode" flag:
1207 * 1: already in kernel mode, don't need SWAPGS
1208 * 0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode
1209 */
1377ENTRY(error_exit) 1210ENTRY(error_exit)
1378 DEFAULT_FRAME 1211 movl %ebx, %eax
1379 movl %ebx,%eax
1380 RESTORE_EXTRA_REGS 1212 RESTORE_EXTRA_REGS
1381 DISABLE_INTERRUPTS(CLBR_NONE) 1213 DISABLE_INTERRUPTS(CLBR_NONE)
1382 TRACE_IRQS_OFF 1214 TRACE_IRQS_OFF
1383 GET_THREAD_INFO(%rcx) 1215 testl %eax, %eax
1384 testl %eax,%eax 1216 jnz retint_kernel
1385 jne retint_kernel 1217 jmp retint_user
1386 LOCKDEP_SYS_EXIT_IRQ
1387 movl TI_flags(%rcx),%edx
1388 movl $_TIF_WORK_MASK,%edi
1389 andl %edi,%edx
1390 jnz retint_careful
1391 jmp retint_swapgs
1392 CFI_ENDPROC
1393END(error_exit) 1218END(error_exit)
1394 1219
1395/* Runs on exception stack */ 1220/* Runs on exception stack */
1396ENTRY(nmi) 1221ENTRY(nmi)
1397 INTR_FRAME
1398 PARAVIRT_ADJUST_EXCEPTION_FRAME 1222 PARAVIRT_ADJUST_EXCEPTION_FRAME
1399 /* 1223 /*
1400 * We allow breakpoints in NMIs. If a breakpoint occurs, then 1224 * We allow breakpoints in NMIs. If a breakpoint occurs, then
@@ -1429,22 +1253,21 @@ ENTRY(nmi)
1429 */ 1253 */
1430 1254
1431 /* Use %rdx as our temp variable throughout */ 1255 /* Use %rdx as our temp variable throughout */
1432 pushq_cfi %rdx 1256 pushq %rdx
1433 CFI_REL_OFFSET rdx, 0
1434 1257
1435 /* 1258 /*
1436 * If %cs was not the kernel segment, then the NMI triggered in user 1259 * If %cs was not the kernel segment, then the NMI triggered in user
1437 * space, which means it is definitely not nested. 1260 * space, which means it is definitely not nested.
1438 */ 1261 */
1439 cmpl $__KERNEL_CS, 16(%rsp) 1262 cmpl $__KERNEL_CS, 16(%rsp)
1440 jne first_nmi 1263 jne first_nmi
1441 1264
1442 /* 1265 /*
1443 * Check the special variable on the stack to see if NMIs are 1266 * Check the special variable on the stack to see if NMIs are
1444 * executing. 1267 * executing.
1445 */ 1268 */
1446 cmpl $1, -8(%rsp) 1269 cmpl $1, -8(%rsp)
1447 je nested_nmi 1270 je nested_nmi
1448 1271
1449 /* 1272 /*
1450 * Now test if the previous stack was an NMI stack. 1273 * Now test if the previous stack was an NMI stack.
@@ -1458,51 +1281,46 @@ ENTRY(nmi)
1458 cmpq %rdx, 4*8(%rsp) 1281 cmpq %rdx, 4*8(%rsp)
1459 /* If the stack pointer is above the NMI stack, this is a normal NMI */ 1282 /* If the stack pointer is above the NMI stack, this is a normal NMI */
1460 ja first_nmi 1283 ja first_nmi
1284
1461 subq $EXCEPTION_STKSZ, %rdx 1285 subq $EXCEPTION_STKSZ, %rdx
1462 cmpq %rdx, 4*8(%rsp) 1286 cmpq %rdx, 4*8(%rsp)
1463 /* If it is below the NMI stack, it is a normal NMI */ 1287 /* If it is below the NMI stack, it is a normal NMI */
1464 jb first_nmi 1288 jb first_nmi
1465 /* Ah, it is within the NMI stack, treat it as nested */ 1289 /* Ah, it is within the NMI stack, treat it as nested */
1466 1290
1467 CFI_REMEMBER_STATE
1468
1469nested_nmi: 1291nested_nmi:
1470 /* 1292 /*
1471 * Do nothing if we interrupted the fixup in repeat_nmi. 1293 * Do nothing if we interrupted the fixup in repeat_nmi.
1472 * It's about to repeat the NMI handler, so we are fine 1294 * It's about to repeat the NMI handler, so we are fine
1473 * with ignoring this one. 1295 * with ignoring this one.
1474 */ 1296 */
1475 movq $repeat_nmi, %rdx 1297 movq $repeat_nmi, %rdx
1476 cmpq 8(%rsp), %rdx 1298 cmpq 8(%rsp), %rdx
1477 ja 1f 1299 ja 1f
1478 movq $end_repeat_nmi, %rdx 1300 movq $end_repeat_nmi, %rdx
1479 cmpq 8(%rsp), %rdx 1301 cmpq 8(%rsp), %rdx
1480 ja nested_nmi_out 1302 ja nested_nmi_out
1481 1303
14821: 13041:
1483 /* Set up the interrupted NMIs stack to jump to repeat_nmi */ 1305 /* Set up the interrupted NMIs stack to jump to repeat_nmi */
1484 leaq -1*8(%rsp), %rdx 1306 leaq -1*8(%rsp), %rdx
1485 movq %rdx, %rsp 1307 movq %rdx, %rsp
1486 CFI_ADJUST_CFA_OFFSET 1*8 1308 leaq -10*8(%rsp), %rdx
1487 leaq -10*8(%rsp), %rdx 1309 pushq $__KERNEL_DS
1488 pushq_cfi $__KERNEL_DS 1310 pushq %rdx
1489 pushq_cfi %rdx 1311 pushfq
1490 pushfq_cfi 1312 pushq $__KERNEL_CS
1491 pushq_cfi $__KERNEL_CS 1313 pushq $repeat_nmi
1492 pushq_cfi $repeat_nmi
1493 1314
1494 /* Put stack back */ 1315 /* Put stack back */
1495 addq $(6*8), %rsp 1316 addq $(6*8), %rsp
1496 CFI_ADJUST_CFA_OFFSET -6*8
1497 1317
1498nested_nmi_out: 1318nested_nmi_out:
1499 popq_cfi %rdx 1319 popq %rdx
1500 CFI_RESTORE rdx
1501 1320
1502 /* No need to check faults here */ 1321 /* No need to check faults here */
1503 INTERRUPT_RETURN 1322 INTERRUPT_RETURN
1504 1323
1505 CFI_RESTORE_STATE
1506first_nmi: 1324first_nmi:
1507 /* 1325 /*
1508 * Because nested NMIs will use the pushed location that we 1326 * Because nested NMIs will use the pushed location that we
@@ -1540,23 +1358,18 @@ first_nmi:
1540 * is also used by nested NMIs and can not be trusted on exit. 1358 * is also used by nested NMIs and can not be trusted on exit.
1541 */ 1359 */
1542 /* Do not pop rdx, nested NMIs will corrupt that part of the stack */ 1360 /* Do not pop rdx, nested NMIs will corrupt that part of the stack */
1543 movq (%rsp), %rdx 1361 movq (%rsp), %rdx
1544 CFI_RESTORE rdx
1545 1362
1546 /* Set the NMI executing variable on the stack. */ 1363 /* Set the NMI executing variable on the stack. */
1547 pushq_cfi $1 1364 pushq $1
1548 1365
1549 /* 1366 /* Leave room for the "copied" frame */
1550 * Leave room for the "copied" frame 1367 subq $(5*8), %rsp
1551 */
1552 subq $(5*8), %rsp
1553 CFI_ADJUST_CFA_OFFSET 5*8
1554 1368
1555 /* Copy the stack frame to the Saved frame */ 1369 /* Copy the stack frame to the Saved frame */
1556 .rept 5 1370 .rept 5
1557 pushq_cfi 11*8(%rsp) 1371 pushq 11*8(%rsp)
1558 .endr 1372 .endr
1559 CFI_DEF_CFA_OFFSET 5*8
1560 1373
1561 /* Everything up to here is safe from nested NMIs */ 1374 /* Everything up to here is safe from nested NMIs */
1562 1375
@@ -1575,16 +1388,14 @@ repeat_nmi:
1575 * is benign for the non-repeat case, where 1 was pushed just above 1388 * is benign for the non-repeat case, where 1 was pushed just above
1576 * to this very stack slot). 1389 * to this very stack slot).
1577 */ 1390 */
1578 movq $1, 10*8(%rsp) 1391 movq $1, 10*8(%rsp)
1579 1392
1580 /* Make another copy, this one may be modified by nested NMIs */ 1393 /* Make another copy, this one may be modified by nested NMIs */
1581 addq $(10*8), %rsp 1394 addq $(10*8), %rsp
1582 CFI_ADJUST_CFA_OFFSET -10*8
1583 .rept 5 1395 .rept 5
1584 pushq_cfi -6*8(%rsp) 1396 pushq -6*8(%rsp)
1585 .endr 1397 .endr
1586 subq $(5*8), %rsp 1398 subq $(5*8), %rsp
1587 CFI_DEF_CFA_OFFSET 5*8
1588end_repeat_nmi: 1399end_repeat_nmi:
1589 1400
1590 /* 1401 /*
@@ -1592,7 +1403,7 @@ end_repeat_nmi:
1592 * NMI if the first NMI took an exception and reset our iret stack 1403 * NMI if the first NMI took an exception and reset our iret stack
1593 * so that we repeat another NMI. 1404 * so that we repeat another NMI.
1594 */ 1405 */
1595 pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */ 1406 pushq $-1 /* ORIG_RAX: no syscall to restart */
1596 ALLOC_PT_GPREGS_ON_STACK 1407 ALLOC_PT_GPREGS_ON_STACK
1597 1408
1598 /* 1409 /*
@@ -1602,8 +1413,7 @@ end_repeat_nmi:
1602 * setting NEED_RESCHED or anything that normal interrupts and 1413 * setting NEED_RESCHED or anything that normal interrupts and
1603 * exceptions might do. 1414 * exceptions might do.
1604 */ 1415 */
1605 call paranoid_entry 1416 call paranoid_entry
1606 DEFAULT_FRAME 0
1607 1417
1608 /* 1418 /*
1609 * Save off the CR2 register. If we take a page fault in the NMI then 1419 * Save off the CR2 register. If we take a page fault in the NMI then
@@ -1614,22 +1424,21 @@ end_repeat_nmi:
1614 * origin fault. Save it off and restore it if it changes. 1424 * origin fault. Save it off and restore it if it changes.
1615 * Use the r12 callee-saved register. 1425 * Use the r12 callee-saved register.
1616 */ 1426 */
1617 movq %cr2, %r12 1427 movq %cr2, %r12
1618 1428
1619 /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ 1429 /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
1620 movq %rsp,%rdi 1430 movq %rsp, %rdi
1621 movq $-1,%rsi 1431 movq $-1, %rsi
1622 call do_nmi 1432 call do_nmi
1623 1433
1624 /* Did the NMI take a page fault? Restore cr2 if it did */ 1434 /* Did the NMI take a page fault? Restore cr2 if it did */
1625 movq %cr2, %rcx 1435 movq %cr2, %rcx
1626 cmpq %rcx, %r12 1436 cmpq %rcx, %r12
1627 je 1f 1437 je 1f
1628 movq %r12, %cr2 1438 movq %r12, %cr2
16291: 14391:
1630 1440 testl %ebx, %ebx /* swapgs needed? */
1631 testl %ebx,%ebx /* swapgs needed? */ 1441 jnz nmi_restore
1632 jnz nmi_restore
1633nmi_swapgs: 1442nmi_swapgs:
1634 SWAPGS_UNSAFE_STACK 1443 SWAPGS_UNSAFE_STACK
1635nmi_restore: 1444nmi_restore:
@@ -1639,15 +1448,11 @@ nmi_restore:
1639 REMOVE_PT_GPREGS_FROM_STACK 6*8 1448 REMOVE_PT_GPREGS_FROM_STACK 6*8
1640 1449
1641 /* Clear the NMI executing stack variable */ 1450 /* Clear the NMI executing stack variable */
1642 movq $0, 5*8(%rsp) 1451 movq $0, 5*8(%rsp)
1643 jmp irq_return 1452 INTERRUPT_RETURN
1644 CFI_ENDPROC
1645END(nmi) 1453END(nmi)
1646 1454
1647ENTRY(ignore_sysret) 1455ENTRY(ignore_sysret)
1648 CFI_STARTPROC 1456 mov $-ENOSYS, %eax
1649 mov $-ENOSYS,%eax
1650 sysret 1457 sysret
1651 CFI_ENDPROC
1652END(ignore_sysret) 1458END(ignore_sysret)
1653
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
new file mode 100644
index 000000000000..bb187a6a877c
--- /dev/null
+++ b/arch/x86/entry/entry_64_compat.S
@@ -0,0 +1,556 @@
1/*
2 * Compatibility mode system call entry point for x86-64.
3 *
4 * Copyright 2000-2002 Andi Kleen, SuSE Labs.
5 */
6#include "calling.h"
7#include <asm/asm-offsets.h>
8#include <asm/current.h>
9#include <asm/errno.h>
10#include <asm/ia32_unistd.h>
11#include <asm/thread_info.h>
12#include <asm/segment.h>
13#include <asm/irqflags.h>
14#include <asm/asm.h>
15#include <asm/smap.h>
16#include <linux/linkage.h>
17#include <linux/err.h>
18
19/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
20#include <linux/elf-em.h>
21#define AUDIT_ARCH_I386 (EM_386|__AUDIT_ARCH_LE)
22#define __AUDIT_ARCH_LE 0x40000000
23
24#ifndef CONFIG_AUDITSYSCALL
25# define sysexit_audit ia32_ret_from_sys_call
26# define sysretl_audit ia32_ret_from_sys_call
27#endif
28
29 .section .entry.text, "ax"
30
31#ifdef CONFIG_PARAVIRT
32ENTRY(native_usergs_sysret32)
33 swapgs
34 sysretl
35ENDPROC(native_usergs_sysret32)
36#endif
37
38/*
39 * 32-bit SYSENTER instruction entry.
40 *
41 * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
42 * IF and VM in rflags are cleared (IOW: interrupts are off).
43 * SYSENTER does not save anything on the stack,
44 * and does not save old rip (!!!) and rflags.
45 *
46 * Arguments:
47 * eax system call number
48 * ebx arg1
49 * ecx arg2
50 * edx arg3
51 * esi arg4
52 * edi arg5
53 * ebp user stack
54 * 0(%ebp) arg6
55 *
56 * This is purely a fast path. For anything complicated we use the int 0x80
57 * path below. We set up a complete hardware stack frame to share code
58 * with the int 0x80 path.
59 */
60ENTRY(entry_SYSENTER_compat)
61 /*
62 * Interrupts are off on entry.
63 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
64 * it is too small to ever cause noticeable irq latency.
65 */
66 SWAPGS_UNSAFE_STACK
67 movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
68 ENABLE_INTERRUPTS(CLBR_NONE)
69
70 /* Zero-extending 32-bit regs, do not remove */
71 movl %ebp, %ebp
72 movl %eax, %eax
73
74 movl ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d
75
76 /* Construct struct pt_regs on stack */
77 pushq $__USER32_DS /* pt_regs->ss */
78 pushq %rbp /* pt_regs->sp */
79 pushfq /* pt_regs->flags */
80 pushq $__USER32_CS /* pt_regs->cs */
81 pushq %r10 /* pt_regs->ip = thread_info->sysenter_return */
82 pushq %rax /* pt_regs->orig_ax */
83 pushq %rdi /* pt_regs->di */
84 pushq %rsi /* pt_regs->si */
85 pushq %rdx /* pt_regs->dx */
86 pushq %rcx /* pt_regs->cx */
87 pushq $-ENOSYS /* pt_regs->ax */
88 cld
89 sub $(10*8), %rsp /* pt_regs->r8-11, bp, bx, r12-15 not saved */
90
91 /*
92 * no need to do an access_ok check here because rbp has been
93 * 32-bit zero extended
94 */
95 ASM_STAC
961: movl (%rbp), %ebp
97 _ASM_EXTABLE(1b, ia32_badarg)
98 ASM_CLAC
99
100 /*
101 * Sysenter doesn't filter flags, so we need to clear NT
102 * ourselves. To save a few cycles, we can check whether
103 * NT was set instead of doing an unconditional popfq.
104 */
105 testl $X86_EFLAGS_NT, EFLAGS(%rsp)
106 jnz sysenter_fix_flags
107sysenter_flags_fixed:
108
109 orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
110 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
111 jnz sysenter_tracesys
112
113sysenter_do_call:
114 /* 32-bit syscall -> 64-bit C ABI argument conversion */
115 movl %edi, %r8d /* arg5 */
116 movl %ebp, %r9d /* arg6 */
117 xchg %ecx, %esi /* rsi:arg2, rcx:arg4 */
118 movl %ebx, %edi /* arg1 */
119 movl %edx, %edx /* arg3 (zero extension) */
120sysenter_dispatch:
121 cmpq $(IA32_NR_syscalls-1), %rax
122 ja 1f
123 call *ia32_sys_call_table(, %rax, 8)
124 movq %rax, RAX(%rsp)
1251:
126 DISABLE_INTERRUPTS(CLBR_NONE)
127 TRACE_IRQS_OFF
128 testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
129 jnz sysexit_audit
130sysexit_from_sys_call:
131 /*
132 * NB: SYSEXIT is not obviously safe for 64-bit kernels -- an
133 * NMI between STI and SYSEXIT has poorly specified behavior,
134 * and and NMI followed by an IRQ with usergs is fatal. So
135 * we just pretend we're using SYSEXIT but we really use
136 * SYSRETL instead.
137 *
138 * This code path is still called 'sysexit' because it pairs
139 * with 'sysenter' and it uses the SYSENTER calling convention.
140 */
141 andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
142 movl RIP(%rsp), %ecx /* User %eip */
143 RESTORE_RSI_RDI
144 xorl %edx, %edx /* Do not leak kernel information */
145 xorq %r8, %r8
146 xorq %r9, %r9
147 xorq %r10, %r10
148 movl EFLAGS(%rsp), %r11d /* User eflags */
149 TRACE_IRQS_ON
150
151 /*
152 * SYSRETL works even on Intel CPUs. Use it in preference to SYSEXIT,
153 * since it avoids a dicey window with interrupts enabled.
154 */
155 movl RSP(%rsp), %esp
156
157 /*
158 * USERGS_SYSRET32 does:
159 * gsbase = user's gs base
160 * eip = ecx
161 * rflags = r11
162 * cs = __USER32_CS
163 * ss = __USER_DS
164 *
165 * The prologue set RIP(%rsp) to VDSO32_SYSENTER_RETURN, which does:
166 *
167 * pop %ebp
168 * pop %edx
169 * pop %ecx
170 *
171 * Therefore, we invoke SYSRETL with EDX and R8-R10 zeroed to
172 * avoid info leaks. R11 ends up with VDSO32_SYSENTER_RETURN's
173 * address (already known to user code), and R12-R15 are
174 * callee-saved and therefore don't contain any interesting
175 * kernel data.
176 */
177 USERGS_SYSRET32
178
179#ifdef CONFIG_AUDITSYSCALL
180 .macro auditsys_entry_common
181 /*
182 * At this point, registers hold syscall args in the 32-bit syscall ABI:
183 * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,EBP.
184 *
185 * We want to pass them to __audit_syscall_entry(), which is a 64-bit
186 * C function with 5 parameters, so shuffle them to match what
187 * the function expects: RDI,RSI,RDX,RCX,R8.
188 */
189 movl %esi, %r8d /* arg5 (R8 ) <= 4th syscall arg (ESI) */
190 xchg %ecx, %edx /* arg4 (RCX) <= 3rd syscall arg (EDX) */
191 /* arg3 (RDX) <= 2nd syscall arg (ECX) */
192 movl %ebx, %esi /* arg2 (RSI) <= 1st syscall arg (EBX) */
193 movl %eax, %edi /* arg1 (RDI) <= syscall number (EAX) */
194 call __audit_syscall_entry
195
196 /*
197 * We are going to jump back to the syscall dispatch code.
198 * Prepare syscall args as required by the 64-bit C ABI.
199 * Registers clobbered by __audit_syscall_entry() are
200 * loaded from pt_regs on stack:
201 */
202 movl ORIG_RAX(%rsp), %eax /* syscall number */
203 movl %ebx, %edi /* arg1 */
204 movl RCX(%rsp), %esi /* arg2 */
205 movl RDX(%rsp), %edx /* arg3 */
206 movl RSI(%rsp), %ecx /* arg4 */
207 movl RDI(%rsp), %r8d /* arg5 */
208 movl %ebp, %r9d /* arg6 */
209 .endm
210
211 .macro auditsys_exit exit
212 testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
213 jnz ia32_ret_from_sys_call
214 TRACE_IRQS_ON
215 ENABLE_INTERRUPTS(CLBR_NONE)
216 movl %eax, %esi /* second arg, syscall return value */
217 cmpl $-MAX_ERRNO, %eax /* is it an error ? */
218 jbe 1f
219 movslq %eax, %rsi /* if error sign extend to 64 bits */
2201: setbe %al /* 1 if error, 0 if not */
221 movzbl %al, %edi /* zero-extend that into %edi */
222 call __audit_syscall_exit
223 movq RAX(%rsp), %rax /* reload syscall return value */
224 movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
225 DISABLE_INTERRUPTS(CLBR_NONE)
226 TRACE_IRQS_OFF
227 testl %edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
228 jz \exit
229 xorl %eax, %eax /* Do not leak kernel information */
230 movq %rax, R11(%rsp)
231 movq %rax, R10(%rsp)
232 movq %rax, R9(%rsp)
233 movq %rax, R8(%rsp)
234 jmp int_with_check
235 .endm
236
237sysenter_auditsys:
238 auditsys_entry_common
239 jmp sysenter_dispatch
240
241sysexit_audit:
242 auditsys_exit sysexit_from_sys_call
243#endif
244
245sysenter_fix_flags:
246 pushq $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
247 popfq
248 jmp sysenter_flags_fixed
249
250sysenter_tracesys:
251#ifdef CONFIG_AUDITSYSCALL
252 testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
253 jz sysenter_auditsys
254#endif
255 SAVE_EXTRA_REGS
256 xorl %eax, %eax /* Do not leak kernel information */
257 movq %rax, R11(%rsp)
258 movq %rax, R10(%rsp)
259 movq %rax, R9(%rsp)
260 movq %rax, R8(%rsp)
261 movq %rsp, %rdi /* &pt_regs -> arg1 */
262 call syscall_trace_enter
263
264 /* Reload arg registers from stack. (see sysenter_tracesys) */
265 movl RCX(%rsp), %ecx
266 movl RDX(%rsp), %edx
267 movl RSI(%rsp), %esi
268 movl RDI(%rsp), %edi
269 movl %eax, %eax /* zero extension */
270
271 RESTORE_EXTRA_REGS
272 jmp sysenter_do_call
273ENDPROC(entry_SYSENTER_compat)
274
275/*
276 * 32-bit SYSCALL instruction entry.
277 *
278 * 32-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
279 * then loads new ss, cs, and rip from previously programmed MSRs.
280 * rflags gets masked by a value from another MSR (so CLD and CLAC
281 * are not needed). SYSCALL does not save anything on the stack
282 * and does not change rsp.
283 *
284 * Note: rflags saving+masking-with-MSR happens only in Long mode
285 * (in legacy 32-bit mode, IF, RF and VM bits are cleared and that's it).
286 * Don't get confused: rflags saving+masking depends on Long Mode Active bit
287 * (EFER.LMA=1), NOT on bitness of userspace where SYSCALL executes
288 * or target CS descriptor's L bit (SYSCALL does not read segment descriptors).
289 *
290 * Arguments:
291 * eax system call number
292 * ecx return address
293 * ebx arg1
294 * ebp arg2 (note: not saved in the stack frame, should not be touched)
295 * edx arg3
296 * esi arg4
297 * edi arg5
298 * esp user stack
299 * 0(%esp) arg6
300 *
301 * This is purely a fast path. For anything complicated we use the int 0x80
302 * path below. We set up a complete hardware stack frame to share code
303 * with the int 0x80 path.
304 */
305ENTRY(entry_SYSCALL_compat)
306 /*
307 * Interrupts are off on entry.
308 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
309 * it is too small to ever cause noticeable irq latency.
310 */
311 SWAPGS_UNSAFE_STACK
312 movl %esp, %r8d
313 movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
314 ENABLE_INTERRUPTS(CLBR_NONE)
315
316 /* Zero-extending 32-bit regs, do not remove */
317 movl %eax, %eax
318
319 /* Construct struct pt_regs on stack */
320 pushq $__USER32_DS /* pt_regs->ss */
321 pushq %r8 /* pt_regs->sp */
322 pushq %r11 /* pt_regs->flags */
323 pushq $__USER32_CS /* pt_regs->cs */
324 pushq %rcx /* pt_regs->ip */
325 pushq %rax /* pt_regs->orig_ax */
326 pushq %rdi /* pt_regs->di */
327 pushq %rsi /* pt_regs->si */
328 pushq %rdx /* pt_regs->dx */
329 pushq %rbp /* pt_regs->cx */
330 movl %ebp, %ecx
331 pushq $-ENOSYS /* pt_regs->ax */
332 sub $(10*8), %rsp /* pt_regs->r8-11, bp, bx, r12-15 not saved */
333
334 /*
335 * No need to do an access_ok check here because r8 has been
336 * 32-bit zero extended:
337 */
338 ASM_STAC
3391: movl (%r8), %ebp
340 _ASM_EXTABLE(1b, ia32_badarg)
341 ASM_CLAC
342 orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
343 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
344 jnz cstar_tracesys
345
346cstar_do_call:
347 /* 32-bit syscall -> 64-bit C ABI argument conversion */
348 movl %edi, %r8d /* arg5 */
349 movl %ebp, %r9d /* arg6 */
350 xchg %ecx, %esi /* rsi:arg2, rcx:arg4 */
351 movl %ebx, %edi /* arg1 */
352 movl %edx, %edx /* arg3 (zero extension) */
353
354cstar_dispatch:
355 cmpq $(IA32_NR_syscalls-1), %rax
356 ja 1f
357
358 call *ia32_sys_call_table(, %rax, 8)
359 movq %rax, RAX(%rsp)
3601:
361 movl RCX(%rsp), %ebp
362 DISABLE_INTERRUPTS(CLBR_NONE)
363 TRACE_IRQS_OFF
364 testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
365 jnz sysretl_audit
366
367sysretl_from_sys_call:
368 andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
369 RESTORE_RSI_RDI_RDX
370 movl RIP(%rsp), %ecx
371 movl EFLAGS(%rsp), %r11d
372 xorq %r10, %r10
373 xorq %r9, %r9
374 xorq %r8, %r8
375 TRACE_IRQS_ON
376 movl RSP(%rsp), %esp
377 /*
378 * 64-bit->32-bit SYSRET restores eip from ecx,
379 * eflags from r11 (but RF and VM bits are forced to 0),
380 * cs and ss are loaded from MSRs.
381 * (Note: 32-bit->32-bit SYSRET is different: since r11
382 * does not exist, it merely sets eflags.IF=1).
383 *
384 * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
385 * descriptor is not reinitialized. This means that we must
386 * avoid SYSRET with SS == NULL, which could happen if we schedule,
387 * exit the kernel, and re-enter using an interrupt vector. (All
388 * interrupt entries on x86_64 set SS to NULL.) We prevent that
389 * from happening by reloading SS in __switch_to.
390 */
391 USERGS_SYSRET32
392
393#ifdef CONFIG_AUDITSYSCALL
394cstar_auditsys:
395 auditsys_entry_common
396 jmp cstar_dispatch
397
398sysretl_audit:
399 auditsys_exit sysretl_from_sys_call
400#endif
401
402cstar_tracesys:
403#ifdef CONFIG_AUDITSYSCALL
404 testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
405 jz cstar_auditsys
406#endif
407 SAVE_EXTRA_REGS
408 xorl %eax, %eax /* Do not leak kernel information */
409 movq %rax, R11(%rsp)
410 movq %rax, R10(%rsp)
411 movq %rax, R9(%rsp)
412 movq %rax, R8(%rsp)
413 movq %rsp, %rdi /* &pt_regs -> arg1 */
414 call syscall_trace_enter
415
416 /* Reload arg registers from stack. (see sysenter_tracesys) */
417 movl RCX(%rsp), %ecx
418 movl RDX(%rsp), %edx
419 movl RSI(%rsp), %esi
420 movl RDI(%rsp), %edi
421 movl %eax, %eax /* zero extension */
422
423 RESTORE_EXTRA_REGS
424 jmp cstar_do_call
425END(entry_SYSCALL_compat)
426
427ia32_badarg:
428 ASM_CLAC
429 movq $-EFAULT, RAX(%rsp)
430ia32_ret_from_sys_call:
431 xorl %eax, %eax /* Do not leak kernel information */
432 movq %rax, R11(%rsp)
433 movq %rax, R10(%rsp)
434 movq %rax, R9(%rsp)
435 movq %rax, R8(%rsp)
436 jmp int_ret_from_sys_call
437
438/*
439 * Emulated IA32 system calls via int 0x80.
440 *
441 * Arguments:
442 * eax system call number
443 * ebx arg1
444 * ecx arg2
445 * edx arg3
446 * esi arg4
447 * edi arg5
448 * ebp arg6 (note: not saved in the stack frame, should not be touched)
449 *
450 * Notes:
451 * Uses the same stack frame as the x86-64 version.
452 * All registers except eax must be saved (but ptrace may violate that).
453 * Arguments are zero extended. For system calls that want sign extension and
454 * take long arguments a wrapper is needed. Most calls can just be called
455 * directly.
456 * Assumes it is only called from user space and entered with interrupts off.
457 */
458
459ENTRY(entry_INT80_compat)
460 /*
461 * Interrupts are off on entry.
462 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
463 * it is too small to ever cause noticeable irq latency.
464 */
465 PARAVIRT_ADJUST_EXCEPTION_FRAME
466 SWAPGS
467 ENABLE_INTERRUPTS(CLBR_NONE)
468
469 /* Zero-extending 32-bit regs, do not remove */
470 movl %eax, %eax
471
472 /* Construct struct pt_regs on stack (iret frame is already on stack) */
473 pushq %rax /* pt_regs->orig_ax */
474 pushq %rdi /* pt_regs->di */
475 pushq %rsi /* pt_regs->si */
476 pushq %rdx /* pt_regs->dx */
477 pushq %rcx /* pt_regs->cx */
478 pushq $-ENOSYS /* pt_regs->ax */
479 pushq $0 /* pt_regs->r8 */
480 pushq $0 /* pt_regs->r9 */
481 pushq $0 /* pt_regs->r10 */
482 pushq $0 /* pt_regs->r11 */
483 cld
484 sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
485
486 orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
487 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
488 jnz ia32_tracesys
489
490ia32_do_call:
491 /* 32-bit syscall -> 64-bit C ABI argument conversion */
492 movl %edi, %r8d /* arg5 */
493 movl %ebp, %r9d /* arg6 */
494 xchg %ecx, %esi /* rsi:arg2, rcx:arg4 */
495 movl %ebx, %edi /* arg1 */
496 movl %edx, %edx /* arg3 (zero extension) */
497 cmpq $(IA32_NR_syscalls-1), %rax
498 ja 1f
499
500 call *ia32_sys_call_table(, %rax, 8)
501 movq %rax, RAX(%rsp)
5021:
503 jmp int_ret_from_sys_call
504
505ia32_tracesys:
506 SAVE_EXTRA_REGS
507 movq %rsp, %rdi /* &pt_regs -> arg1 */
508 call syscall_trace_enter
509 /*
510 * Reload arg registers from stack in case ptrace changed them.
511 * Don't reload %eax because syscall_trace_enter() returned
512 * the %rax value we should see. But do truncate it to 32 bits.
513 * If it's -1 to make us punt the syscall, then (u32)-1 is still
514 * an appropriately invalid value.
515 */
516 movl RCX(%rsp), %ecx
517 movl RDX(%rsp), %edx
518 movl RSI(%rsp), %esi
519 movl RDI(%rsp), %edi
520 movl %eax, %eax /* zero extension */
521 RESTORE_EXTRA_REGS
522 jmp ia32_do_call
523END(entry_INT80_compat)
524
525 .macro PTREGSCALL label, func
526 ALIGN
527GLOBAL(\label)
528 leaq \func(%rip), %rax
529 jmp ia32_ptregs_common
530 .endm
531
532 PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
533 PTREGSCALL stub32_sigreturn, sys32_sigreturn
534 PTREGSCALL stub32_fork, sys_fork
535 PTREGSCALL stub32_vfork, sys_vfork
536
537 ALIGN
538GLOBAL(stub32_clone)
539 leaq sys_clone(%rip), %rax
540 /*
541 * The 32-bit clone ABI is: clone(..., int tls_val, int *child_tidptr).
542 * The 64-bit clone ABI is: clone(..., int *child_tidptr, int tls_val).
543 *
544 * The native 64-bit kernel's sys_clone() implements the latter,
545 * so we need to swap arguments here before calling it:
546 */
547 xchg %r8, %rcx
548 jmp ia32_ptregs_common
549
550 ALIGN
551ia32_ptregs_common:
552 SAVE_EXTRA_REGS 8
553 call *%rax
554 RESTORE_EXTRA_REGS 8
555 ret
556END(ia32_ptregs_common)
diff --git a/arch/x86/kernel/syscall_32.c b/arch/x86/entry/syscall_32.c
index 3777189c4a19..8ea34f94e973 100644
--- a/arch/x86/kernel/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -10,7 +10,7 @@
10#else 10#else
11#define SYM(sym, compat) sym 11#define SYM(sym, compat) sym
12#define ia32_sys_call_table sys_call_table 12#define ia32_sys_call_table sys_call_table
13#define __NR_ia32_syscall_max __NR_syscall_max 13#define __NR_syscall_compat_max __NR_syscall_max
14#endif 14#endif
15 15
16#define __SYSCALL_I386(nr, sym, compat) extern asmlinkage void SYM(sym, compat)(void) ; 16#define __SYSCALL_I386(nr, sym, compat) extern asmlinkage void SYM(sym, compat)(void) ;
@@ -23,11 +23,11 @@ typedef asmlinkage void (*sys_call_ptr_t)(void);
23 23
24extern asmlinkage void sys_ni_syscall(void); 24extern asmlinkage void sys_ni_syscall(void);
25 25
26__visible const sys_call_ptr_t ia32_sys_call_table[__NR_ia32_syscall_max+1] = { 26__visible const sys_call_ptr_t ia32_sys_call_table[__NR_syscall_compat_max+1] = {
27 /* 27 /*
28 * Smells like a compiler bug -- it doesn't work 28 * Smells like a compiler bug -- it doesn't work
29 * when the & below is removed. 29 * when the & below is removed.
30 */ 30 */
31 [0 ... __NR_ia32_syscall_max] = &sys_ni_syscall, 31 [0 ... __NR_syscall_compat_max] = &sys_ni_syscall,
32#include <asm/syscalls_32.h> 32#include <asm/syscalls_32.h>
33}; 33};
diff --git a/arch/x86/kernel/syscall_64.c b/arch/x86/entry/syscall_64.c
index 4ac730b37f0b..4ac730b37f0b 100644
--- a/arch/x86/kernel/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
diff --git a/arch/x86/syscalls/Makefile b/arch/x86/entry/syscalls/Makefile
index a55abb9f6c5e..57aa59fd140c 100644
--- a/arch/x86/syscalls/Makefile
+++ b/arch/x86/entry/syscalls/Makefile
@@ -1,5 +1,5 @@
1out := $(obj)/../include/generated/asm 1out := $(obj)/../../include/generated/asm
2uapi := $(obj)/../include/generated/uapi/asm 2uapi := $(obj)/../../include/generated/uapi/asm
3 3
4# Create output directory if not already present 4# Create output directory if not already present
5_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \ 5_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f9d28d..ef8187f9d28d 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5f1b19..9ef32d5f1b19 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
diff --git a/arch/x86/syscalls/syscallhdr.sh b/arch/x86/entry/syscalls/syscallhdr.sh
index 31fd5f1f38f7..31fd5f1f38f7 100644
--- a/arch/x86/syscalls/syscallhdr.sh
+++ b/arch/x86/entry/syscalls/syscallhdr.sh
diff --git a/arch/x86/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index 0e7f8ec071e7..0e7f8ec071e7 100644
--- a/arch/x86/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
diff --git a/arch/x86/lib/thunk_32.S b/arch/x86/entry/thunk_32.S
index e407941d0488..e5a17114a8c4 100644
--- a/arch/x86/lib/thunk_32.S
+++ b/arch/x86/entry/thunk_32.S
@@ -6,16 +6,14 @@
6 */ 6 */
7 #include <linux/linkage.h> 7 #include <linux/linkage.h>
8 #include <asm/asm.h> 8 #include <asm/asm.h>
9 #include <asm/dwarf2.h>
10 9
11 /* put return address in eax (arg1) */ 10 /* put return address in eax (arg1) */
12 .macro THUNK name, func, put_ret_addr_in_eax=0 11 .macro THUNK name, func, put_ret_addr_in_eax=0
13 .globl \name 12 .globl \name
14\name: 13\name:
15 CFI_STARTPROC 14 pushl %eax
16 pushl_cfi_reg eax 15 pushl %ecx
17 pushl_cfi_reg ecx 16 pushl %edx
18 pushl_cfi_reg edx
19 17
20 .if \put_ret_addr_in_eax 18 .if \put_ret_addr_in_eax
21 /* Place EIP in the arg1 */ 19 /* Place EIP in the arg1 */
@@ -23,11 +21,10 @@
23 .endif 21 .endif
24 22
25 call \func 23 call \func
26 popl_cfi_reg edx 24 popl %edx
27 popl_cfi_reg ecx 25 popl %ecx
28 popl_cfi_reg eax 26 popl %eax
29 ret 27 ret
30 CFI_ENDPROC
31 _ASM_NOKPROBE(\name) 28 _ASM_NOKPROBE(\name)
32 .endm 29 .endm
33 30
diff --git a/arch/x86/lib/thunk_64.S b/arch/x86/entry/thunk_64.S
index 2198902329b5..efb2b932b748 100644
--- a/arch/x86/lib/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -6,35 +6,32 @@
6 * Subject to the GNU public license, v.2. No warranty of any kind. 6 * Subject to the GNU public license, v.2. No warranty of any kind.
7 */ 7 */
8#include <linux/linkage.h> 8#include <linux/linkage.h>
9#include <asm/dwarf2.h> 9#include "calling.h"
10#include <asm/calling.h>
11#include <asm/asm.h> 10#include <asm/asm.h>
12 11
13 /* rdi: arg1 ... normal C conventions. rax is saved/restored. */ 12 /* rdi: arg1 ... normal C conventions. rax is saved/restored. */
14 .macro THUNK name, func, put_ret_addr_in_rdi=0 13 .macro THUNK name, func, put_ret_addr_in_rdi=0
15 .globl \name 14 .globl \name
16\name: 15\name:
17 CFI_STARTPROC
18 16
19 /* this one pushes 9 elems, the next one would be %rIP */ 17 /* this one pushes 9 elems, the next one would be %rIP */
20 pushq_cfi_reg rdi 18 pushq %rdi
21 pushq_cfi_reg rsi 19 pushq %rsi
22 pushq_cfi_reg rdx 20 pushq %rdx
23 pushq_cfi_reg rcx 21 pushq %rcx
24 pushq_cfi_reg rax 22 pushq %rax
25 pushq_cfi_reg r8 23 pushq %r8
26 pushq_cfi_reg r9 24 pushq %r9
27 pushq_cfi_reg r10 25 pushq %r10
28 pushq_cfi_reg r11 26 pushq %r11
29 27
30 .if \put_ret_addr_in_rdi 28 .if \put_ret_addr_in_rdi
31 /* 9*8(%rsp) is return addr on stack */ 29 /* 9*8(%rsp) is return addr on stack */
32 movq_cfi_restore 9*8, rdi 30 movq 9*8(%rsp), %rdi
33 .endif 31 .endif
34 32
35 call \func 33 call \func
36 jmp restore 34 jmp restore
37 CFI_ENDPROC
38 _ASM_NOKPROBE(\name) 35 _ASM_NOKPROBE(\name)
39 .endm 36 .endm
40 37
@@ -55,19 +52,16 @@
55#if defined(CONFIG_TRACE_IRQFLAGS) \ 52#if defined(CONFIG_TRACE_IRQFLAGS) \
56 || defined(CONFIG_DEBUG_LOCK_ALLOC) \ 53 || defined(CONFIG_DEBUG_LOCK_ALLOC) \
57 || defined(CONFIG_PREEMPT) 54 || defined(CONFIG_PREEMPT)
58 CFI_STARTPROC
59 CFI_ADJUST_CFA_OFFSET 9*8
60restore: 55restore:
61 popq_cfi_reg r11 56 popq %r11
62 popq_cfi_reg r10 57 popq %r10
63 popq_cfi_reg r9 58 popq %r9
64 popq_cfi_reg r8 59 popq %r8
65 popq_cfi_reg rax 60 popq %rax
66 popq_cfi_reg rcx 61 popq %rcx
67 popq_cfi_reg rdx 62 popq %rdx
68 popq_cfi_reg rsi 63 popq %rsi
69 popq_cfi_reg rdi 64 popq %rdi
70 ret 65 ret
71 CFI_ENDPROC
72 _ASM_NOKPROBE(restore) 66 _ASM_NOKPROBE(restore)
73#endif 67#endif
diff --git a/arch/x86/vdso/.gitignore b/arch/x86/entry/vdso/.gitignore
index aae8ffdd5880..aae8ffdd5880 100644
--- a/arch/x86/vdso/.gitignore
+++ b/arch/x86/entry/vdso/.gitignore
diff --git a/arch/x86/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index e97032069f88..e97032069f88 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
diff --git a/arch/x86/vdso/checkundef.sh b/arch/x86/entry/vdso/checkundef.sh
index 7ee90a9b549d..7ee90a9b549d 100755
--- a/arch/x86/vdso/checkundef.sh
+++ b/arch/x86/entry/vdso/checkundef.sh
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 9793322751e0..9793322751e0 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
diff --git a/arch/x86/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index de2c921025f5..de2c921025f5 100644
--- a/arch/x86/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
diff --git a/arch/x86/vdso/vdso-note.S b/arch/x86/entry/vdso/vdso-note.S
index 79a071e4357e..79a071e4357e 100644
--- a/arch/x86/vdso/vdso-note.S
+++ b/arch/x86/entry/vdso/vdso-note.S
diff --git a/arch/x86/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index 6807932643c2..6807932643c2 100644
--- a/arch/x86/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
diff --git a/arch/x86/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 8627db24a7f6..8627db24a7f6 100644
--- a/arch/x86/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
diff --git a/arch/x86/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 0224987556ce..0224987556ce 100644
--- a/arch/x86/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c
index e904c270573b..e904c270573b 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/entry/vdso/vdso32-setup.c
diff --git a/arch/x86/vdso/vdso32/.gitignore b/arch/x86/entry/vdso/vdso32/.gitignore
index e45fba9d0ced..e45fba9d0ced 100644
--- a/arch/x86/vdso/vdso32/.gitignore
+++ b/arch/x86/entry/vdso/vdso32/.gitignore
diff --git a/arch/x86/vdso/vdso32/int80.S b/arch/x86/entry/vdso/vdso32/int80.S
index b15b7c01aedb..b15b7c01aedb 100644
--- a/arch/x86/vdso/vdso32/int80.S
+++ b/arch/x86/entry/vdso/vdso32/int80.S
diff --git a/arch/x86/vdso/vdso32/note.S b/arch/x86/entry/vdso/vdso32/note.S
index c83f25734696..c83f25734696 100644
--- a/arch/x86/vdso/vdso32/note.S
+++ b/arch/x86/entry/vdso/vdso32/note.S
diff --git a/arch/x86/vdso/vdso32/sigreturn.S b/arch/x86/entry/vdso/vdso32/sigreturn.S
index d7ec4e251c0a..d7ec4e251c0a 100644
--- a/arch/x86/vdso/vdso32/sigreturn.S
+++ b/arch/x86/entry/vdso/vdso32/sigreturn.S
diff --git a/arch/x86/vdso/vdso32/syscall.S b/arch/x86/entry/vdso/vdso32/syscall.S
index 6b286bb5251c..6b286bb5251c 100644
--- a/arch/x86/vdso/vdso32/syscall.S
+++ b/arch/x86/entry/vdso/vdso32/syscall.S
diff --git a/arch/x86/vdso/vdso32/sysenter.S b/arch/x86/entry/vdso/vdso32/sysenter.S
index e354bceee0e0..e354bceee0e0 100644
--- a/arch/x86/vdso/vdso32/sysenter.S
+++ b/arch/x86/entry/vdso/vdso32/sysenter.S
diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
index 175cc72c0f68..175cc72c0f68 100644
--- a/arch/x86/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
diff --git a/arch/x86/vdso/vdso32/vdso-fakesections.c b/arch/x86/entry/vdso/vdso32/vdso-fakesections.c
index 541468e25265..541468e25265 100644
--- a/arch/x86/vdso/vdso32/vdso-fakesections.c
+++ b/arch/x86/entry/vdso/vdso32/vdso-fakesections.c
diff --git a/arch/x86/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
index 31056cf294bf..31056cf294bf 100644
--- a/arch/x86/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S
diff --git a/arch/x86/vdso/vdsox32.lds.S b/arch/x86/entry/vdso/vdsox32.lds.S
index 697c11ece90c..697c11ece90c 100644
--- a/arch/x86/vdso/vdsox32.lds.S
+++ b/arch/x86/entry/vdso/vdsox32.lds.S
diff --git a/arch/x86/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
index 8ec3d1f4ce9a..8ec3d1f4ce9a 100644
--- a/arch/x86/vdso/vgetcpu.c
+++ b/arch/x86/entry/vdso/vgetcpu.c
diff --git a/arch/x86/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 1c9f750c3859..1c9f750c3859 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
diff --git a/arch/x86/entry/vsyscall/Makefile b/arch/x86/entry/vsyscall/Makefile
new file mode 100644
index 000000000000..a9f4856f622a
--- /dev/null
+++ b/arch/x86/entry/vsyscall/Makefile
@@ -0,0 +1,7 @@
1#
2# Makefile for the x86 low level vsyscall code
3#
4obj-y := vsyscall_gtod.o
5
6obj-$(CONFIG_X86_VSYSCALL_EMULATION) += vsyscall_64.o vsyscall_emu_64.o
7
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 2dcc6ff6fdcc..2dcc6ff6fdcc 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
diff --git a/arch/x86/kernel/vsyscall_emu_64.S b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
index c9596a9af159..c9596a9af159 100644
--- a/arch/x86/kernel/vsyscall_emu_64.S
+++ b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
diff --git a/arch/x86/kernel/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c
index 51e330416995..51e330416995 100644
--- a/arch/x86/kernel/vsyscall_gtod.c
+++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
diff --git a/arch/x86/kernel/vsyscall_trace.h b/arch/x86/entry/vsyscall/vsyscall_trace.h
index a8b2edec54fe..9dd7359a38a8 100644
--- a/arch/x86/kernel/vsyscall_trace.h
+++ b/arch/x86/entry/vsyscall/vsyscall_trace.h
@@ -24,6 +24,6 @@ TRACE_EVENT(emulate_vsyscall,
24#endif 24#endif
25 25
26#undef TRACE_INCLUDE_PATH 26#undef TRACE_INCLUDE_PATH
27#define TRACE_INCLUDE_PATH ../../arch/x86/kernel 27#define TRACE_INCLUDE_PATH ../../arch/x86/entry/vsyscall/
28#define TRACE_INCLUDE_FILE vsyscall_trace 28#define TRACE_INCLUDE_FILE vsyscall_trace
29#include <trace/define_trace.h> 29#include <trace/define_trace.h>
diff --git a/arch/x86/ia32/Makefile b/arch/x86/ia32/Makefile
index bb635c641869..cd4339bae066 100644
--- a/arch/x86/ia32/Makefile
+++ b/arch/x86/ia32/Makefile
@@ -2,7 +2,7 @@
2# Makefile for the ia32 kernel emulation subsystem. 2# Makefile for the ia32 kernel emulation subsystem.
3# 3#
4 4
5obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_signal.o 5obj-$(CONFIG_IA32_EMULATION) := sys_ia32.o ia32_signal.o
6 6
7obj-$(CONFIG_IA32_AOUT) += ia32_aout.o 7obj-$(CONFIG_IA32_AOUT) += ia32_aout.o
8 8
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
deleted file mode 100644
index 72bf2680f819..000000000000
--- a/arch/x86/ia32/ia32entry.S
+++ /dev/null
@@ -1,611 +0,0 @@
1/*
2 * Compatibility mode system call entry point for x86-64.
3 *
4 * Copyright 2000-2002 Andi Kleen, SuSE Labs.
5 */
6
7#include <asm/dwarf2.h>
8#include <asm/calling.h>
9#include <asm/asm-offsets.h>
10#include <asm/current.h>
11#include <asm/errno.h>
12#include <asm/ia32_unistd.h>
13#include <asm/thread_info.h>
14#include <asm/segment.h>
15#include <asm/irqflags.h>
16#include <asm/asm.h>
17#include <asm/smap.h>
18#include <linux/linkage.h>
19#include <linux/err.h>
20
21/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
22#include <linux/elf-em.h>
23#define AUDIT_ARCH_I386 (EM_386|__AUDIT_ARCH_LE)
24#define __AUDIT_ARCH_LE 0x40000000
25
26#ifndef CONFIG_AUDITSYSCALL
27#define sysexit_audit ia32_ret_from_sys_call
28#define sysretl_audit ia32_ret_from_sys_call
29#endif
30
31 .section .entry.text, "ax"
32
33 /* clobbers %rax */
34 .macro CLEAR_RREGS _r9=rax
35 xorl %eax,%eax
36 movq %rax,R11(%rsp)
37 movq %rax,R10(%rsp)
38 movq %\_r9,R9(%rsp)
39 movq %rax,R8(%rsp)
40 .endm
41
42 /*
43 * Reload arg registers from stack in case ptrace changed them.
44 * We don't reload %eax because syscall_trace_enter() returned
45 * the %rax value we should see. Instead, we just truncate that
46 * value to 32 bits again as we did on entry from user mode.
47 * If it's a new value set by user_regset during entry tracing,
48 * this matches the normal truncation of the user-mode value.
49 * If it's -1 to make us punt the syscall, then (u32)-1 is still
50 * an appropriately invalid value.
51 */
52 .macro LOAD_ARGS32 _r9=0
53 .if \_r9
54 movl R9(%rsp),%r9d
55 .endif
56 movl RCX(%rsp),%ecx
57 movl RDX(%rsp),%edx
58 movl RSI(%rsp),%esi
59 movl RDI(%rsp),%edi
60 movl %eax,%eax /* zero extension */
61 .endm
62
63 .macro CFI_STARTPROC32 simple
64 CFI_STARTPROC \simple
65 CFI_UNDEFINED r8
66 CFI_UNDEFINED r9
67 CFI_UNDEFINED r10
68 CFI_UNDEFINED r11
69 CFI_UNDEFINED r12
70 CFI_UNDEFINED r13
71 CFI_UNDEFINED r14
72 CFI_UNDEFINED r15
73 .endm
74
75#ifdef CONFIG_PARAVIRT
76ENTRY(native_usergs_sysret32)
77 swapgs
78 sysretl
79ENDPROC(native_usergs_sysret32)
80
81ENTRY(native_irq_enable_sysexit)
82 swapgs
83 sti
84 sysexit
85ENDPROC(native_irq_enable_sysexit)
86#endif
87
88/*
89 * 32bit SYSENTER instruction entry.
90 *
91 * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
92 * IF and VM in rflags are cleared (IOW: interrupts are off).
93 * SYSENTER does not save anything on the stack,
94 * and does not save old rip (!!!) and rflags.
95 *
96 * Arguments:
97 * eax system call number
98 * ebx arg1
99 * ecx arg2
100 * edx arg3
101 * esi arg4
102 * edi arg5
103 * ebp user stack
104 * 0(%ebp) arg6
105 *
106 * This is purely a fast path. For anything complicated we use the int 0x80
107 * path below. We set up a complete hardware stack frame to share code
108 * with the int 0x80 path.
109 */
110ENTRY(ia32_sysenter_target)
111 CFI_STARTPROC32 simple
112 CFI_SIGNAL_FRAME
113 CFI_DEF_CFA rsp,0
114 CFI_REGISTER rsp,rbp
115
116 /*
117 * Interrupts are off on entry.
118 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
119 * it is too small to ever cause noticeable irq latency.
120 */
121 SWAPGS_UNSAFE_STACK
122 movq PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp
123 ENABLE_INTERRUPTS(CLBR_NONE)
124
125 /* Zero-extending 32-bit regs, do not remove */
126 movl %ebp, %ebp
127 movl %eax, %eax
128
129 movl ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d
130 CFI_REGISTER rip,r10
131
132 /* Construct struct pt_regs on stack */
133 pushq_cfi $__USER32_DS /* pt_regs->ss */
134 pushq_cfi %rbp /* pt_regs->sp */
135 CFI_REL_OFFSET rsp,0
136 pushfq_cfi /* pt_regs->flags */
137 pushq_cfi $__USER32_CS /* pt_regs->cs */
138 pushq_cfi %r10 /* pt_regs->ip = thread_info->sysenter_return */
139 CFI_REL_OFFSET rip,0
140 pushq_cfi_reg rax /* pt_regs->orig_ax */
141 pushq_cfi_reg rdi /* pt_regs->di */
142 pushq_cfi_reg rsi /* pt_regs->si */
143 pushq_cfi_reg rdx /* pt_regs->dx */
144 pushq_cfi_reg rcx /* pt_regs->cx */
145 pushq_cfi_reg rax /* pt_regs->ax */
146 cld
147 sub $(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
148 CFI_ADJUST_CFA_OFFSET 10*8
149
150 /*
151 * no need to do an access_ok check here because rbp has been
152 * 32bit zero extended
153 */
154 ASM_STAC
1551: movl (%rbp),%ebp
156 _ASM_EXTABLE(1b,ia32_badarg)
157 ASM_CLAC
158
159 /*
160 * Sysenter doesn't filter flags, so we need to clear NT
161 * ourselves. To save a few cycles, we can check whether
162 * NT was set instead of doing an unconditional popfq.
163 */
164 testl $X86_EFLAGS_NT,EFLAGS(%rsp)
165 jnz sysenter_fix_flags
166sysenter_flags_fixed:
167
168 orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
169 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
170 CFI_REMEMBER_STATE
171 jnz sysenter_tracesys
172 cmpq $(IA32_NR_syscalls-1),%rax
173 ja ia32_badsys
174sysenter_do_call:
175 /* 32bit syscall -> 64bit C ABI argument conversion */
176 movl %edi,%r8d /* arg5 */
177 movl %ebp,%r9d /* arg6 */
178 xchg %ecx,%esi /* rsi:arg2, rcx:arg4 */
179 movl %ebx,%edi /* arg1 */
180 movl %edx,%edx /* arg3 (zero extension) */
181sysenter_dispatch:
182 call *ia32_sys_call_table(,%rax,8)
183 movq %rax,RAX(%rsp)
184 DISABLE_INTERRUPTS(CLBR_NONE)
185 TRACE_IRQS_OFF
186 testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
187 jnz sysexit_audit
188sysexit_from_sys_call:
189 /*
190 * NB: SYSEXIT is not obviously safe for 64-bit kernels -- an
191 * NMI between STI and SYSEXIT has poorly specified behavior,
192 * and and NMI followed by an IRQ with usergs is fatal. So
193 * we just pretend we're using SYSEXIT but we really use
194 * SYSRETL instead.
195 *
196 * This code path is still called 'sysexit' because it pairs
197 * with 'sysenter' and it uses the SYSENTER calling convention.
198 */
199 andl $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
200 movl RIP(%rsp),%ecx /* User %eip */
201 CFI_REGISTER rip,rcx
202 RESTORE_RSI_RDI
203 xorl %edx,%edx /* avoid info leaks */
204 xorq %r8,%r8
205 xorq %r9,%r9
206 xorq %r10,%r10
207 movl EFLAGS(%rsp),%r11d /* User eflags */
208 /*CFI_RESTORE rflags*/
209 TRACE_IRQS_ON
210
211 /*
212 * SYSRETL works even on Intel CPUs. Use it in preference to SYSEXIT,
213 * since it avoids a dicey window with interrupts enabled.
214 */
215 movl RSP(%rsp),%esp
216
217 /*
218 * USERGS_SYSRET32 does:
219 * gsbase = user's gs base
220 * eip = ecx
221 * rflags = r11
222 * cs = __USER32_CS
223 * ss = __USER_DS
224 *
225 * The prologue set RIP(%rsp) to VDSO32_SYSENTER_RETURN, which does:
226 *
227 * pop %ebp
228 * pop %edx
229 * pop %ecx
230 *
231 * Therefore, we invoke SYSRETL with EDX and R8-R10 zeroed to
232 * avoid info leaks. R11 ends up with VDSO32_SYSENTER_RETURN's
233 * address (already known to user code), and R12-R15 are
234 * callee-saved and therefore don't contain any interesting
235 * kernel data.
236 */
237 USERGS_SYSRET32
238
239 CFI_RESTORE_STATE
240
241#ifdef CONFIG_AUDITSYSCALL
242 .macro auditsys_entry_common
243 movl %esi,%r8d /* 5th arg: 4th syscall arg */
244 movl %ecx,%r9d /*swap with edx*/
245 movl %edx,%ecx /* 4th arg: 3rd syscall arg */
246 movl %r9d,%edx /* 3rd arg: 2nd syscall arg */
247 movl %ebx,%esi /* 2nd arg: 1st syscall arg */
248 movl %eax,%edi /* 1st arg: syscall number */
249 call __audit_syscall_entry
250 movl RAX(%rsp),%eax /* reload syscall number */
251 cmpq $(IA32_NR_syscalls-1),%rax
252 ja ia32_badsys
253 movl %ebx,%edi /* reload 1st syscall arg */
254 movl RCX(%rsp),%esi /* reload 2nd syscall arg */
255 movl RDX(%rsp),%edx /* reload 3rd syscall arg */
256 movl RSI(%rsp),%ecx /* reload 4th syscall arg */
257 movl RDI(%rsp),%r8d /* reload 5th syscall arg */
258 .endm
259
260 .macro auditsys_exit exit
261 testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
262 jnz ia32_ret_from_sys_call
263 TRACE_IRQS_ON
264 ENABLE_INTERRUPTS(CLBR_NONE)
265 movl %eax,%esi /* second arg, syscall return value */
266 cmpl $-MAX_ERRNO,%eax /* is it an error ? */
267 jbe 1f
268 movslq %eax, %rsi /* if error sign extend to 64 bits */
2691: setbe %al /* 1 if error, 0 if not */
270 movzbl %al,%edi /* zero-extend that into %edi */
271 call __audit_syscall_exit
272 movq RAX(%rsp),%rax /* reload syscall return value */
273 movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),%edi
274 DISABLE_INTERRUPTS(CLBR_NONE)
275 TRACE_IRQS_OFF
276 testl %edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
277 jz \exit
278 CLEAR_RREGS
279 jmp int_with_check
280 .endm
281
282sysenter_auditsys:
283 auditsys_entry_common
284 movl %ebp,%r9d /* reload 6th syscall arg */
285 jmp sysenter_dispatch
286
287sysexit_audit:
288 auditsys_exit sysexit_from_sys_call
289#endif
290
291sysenter_fix_flags:
292 pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
293 popfq_cfi
294 jmp sysenter_flags_fixed
295
296sysenter_tracesys:
297#ifdef CONFIG_AUDITSYSCALL
298 testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
299 jz sysenter_auditsys
300#endif
301 SAVE_EXTRA_REGS
302 CLEAR_RREGS
303 movq $-ENOSYS,RAX(%rsp)/* ptrace can change this for a bad syscall */
304 movq %rsp,%rdi /* &pt_regs -> arg1 */
305 call syscall_trace_enter
306 LOAD_ARGS32 /* reload args from stack in case ptrace changed it */
307 RESTORE_EXTRA_REGS
308 cmpq $(IA32_NR_syscalls-1),%rax
309 ja int_ret_from_sys_call /* sysenter_tracesys has set RAX(%rsp) */
310 jmp sysenter_do_call
311 CFI_ENDPROC
312ENDPROC(ia32_sysenter_target)
313
314/*
315 * 32bit SYSCALL instruction entry.
316 *
317 * 32bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
318 * then loads new ss, cs, and rip from previously programmed MSRs.
319 * rflags gets masked by a value from another MSR (so CLD and CLAC
320 * are not needed). SYSCALL does not save anything on the stack
321 * and does not change rsp.
322 *
323 * Note: rflags saving+masking-with-MSR happens only in Long mode
324 * (in legacy 32bit mode, IF, RF and VM bits are cleared and that's it).
325 * Don't get confused: rflags saving+masking depends on Long Mode Active bit
326 * (EFER.LMA=1), NOT on bitness of userspace where SYSCALL executes
327 * or target CS descriptor's L bit (SYSCALL does not read segment descriptors).
328 *
329 * Arguments:
330 * eax system call number
331 * ecx return address
332 * ebx arg1
333 * ebp arg2 (note: not saved in the stack frame, should not be touched)
334 * edx arg3
335 * esi arg4
336 * edi arg5
337 * esp user stack
338 * 0(%esp) arg6
339 *
340 * This is purely a fast path. For anything complicated we use the int 0x80
341 * path below. We set up a complete hardware stack frame to share code
342 * with the int 0x80 path.
343 */
344ENTRY(ia32_cstar_target)
345 CFI_STARTPROC32 simple
346 CFI_SIGNAL_FRAME
347 CFI_DEF_CFA rsp,0
348 CFI_REGISTER rip,rcx
349 /*CFI_REGISTER rflags,r11*/
350
351 /*
352 * Interrupts are off on entry.
353 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
354 * it is too small to ever cause noticeable irq latency.
355 */
356 SWAPGS_UNSAFE_STACK
357 movl %esp,%r8d
358 CFI_REGISTER rsp,r8
359 movq PER_CPU_VAR(kernel_stack),%rsp
360 ENABLE_INTERRUPTS(CLBR_NONE)
361
362 /* Zero-extending 32-bit regs, do not remove */
363 movl %eax,%eax
364
365 /* Construct struct pt_regs on stack */
366 pushq_cfi $__USER32_DS /* pt_regs->ss */
367 pushq_cfi %r8 /* pt_regs->sp */
368 CFI_REL_OFFSET rsp,0
369 pushq_cfi %r11 /* pt_regs->flags */
370 pushq_cfi $__USER32_CS /* pt_regs->cs */
371 pushq_cfi %rcx /* pt_regs->ip */
372 CFI_REL_OFFSET rip,0
373 pushq_cfi_reg rax /* pt_regs->orig_ax */
374 pushq_cfi_reg rdi /* pt_regs->di */
375 pushq_cfi_reg rsi /* pt_regs->si */
376 pushq_cfi_reg rdx /* pt_regs->dx */
377 pushq_cfi_reg rbp /* pt_regs->cx */
378 movl %ebp,%ecx
379 pushq_cfi_reg rax /* pt_regs->ax */
380 sub $(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
381 CFI_ADJUST_CFA_OFFSET 10*8
382
383 /*
384 * no need to do an access_ok check here because r8 has been
385 * 32bit zero extended
386 */
387 ASM_STAC
3881: movl (%r8),%r9d
389 _ASM_EXTABLE(1b,ia32_badarg)
390 ASM_CLAC
391 orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
392 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
393 CFI_REMEMBER_STATE
394 jnz cstar_tracesys
395 cmpq $IA32_NR_syscalls-1,%rax
396 ja ia32_badsys
397cstar_do_call:
398 /* 32bit syscall -> 64bit C ABI argument conversion */
399 movl %edi,%r8d /* arg5 */
400 /* r9 already loaded */ /* arg6 */
401 xchg %ecx,%esi /* rsi:arg2, rcx:arg4 */
402 movl %ebx,%edi /* arg1 */
403 movl %edx,%edx /* arg3 (zero extension) */
404cstar_dispatch:
405 call *ia32_sys_call_table(,%rax,8)
406 movq %rax,RAX(%rsp)
407 DISABLE_INTERRUPTS(CLBR_NONE)
408 TRACE_IRQS_OFF
409 testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
410 jnz sysretl_audit
411sysretl_from_sys_call:
412 andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
413 RESTORE_RSI_RDI_RDX
414 movl RIP(%rsp),%ecx
415 CFI_REGISTER rip,rcx
416 movl EFLAGS(%rsp),%r11d
417 /*CFI_REGISTER rflags,r11*/
418 xorq %r10,%r10
419 xorq %r9,%r9
420 xorq %r8,%r8
421 TRACE_IRQS_ON
422 movl RSP(%rsp),%esp
423 CFI_RESTORE rsp
424 /*
425 * 64bit->32bit SYSRET restores eip from ecx,
426 * eflags from r11 (but RF and VM bits are forced to 0),
427 * cs and ss are loaded from MSRs.
428 * (Note: 32bit->32bit SYSRET is different: since r11
429 * does not exist, it merely sets eflags.IF=1).
430 *
431 * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
432 * descriptor is not reinitialized. This means that we must
433 * avoid SYSRET with SS == NULL, which could happen if we schedule,
434 * exit the kernel, and re-enter using an interrupt vector. (All
435 * interrupt entries on x86_64 set SS to NULL.) We prevent that
436 * from happening by reloading SS in __switch_to.
437 */
438 USERGS_SYSRET32
439
440#ifdef CONFIG_AUDITSYSCALL
441cstar_auditsys:
442 CFI_RESTORE_STATE
443 movl %r9d,R9(%rsp) /* register to be clobbered by call */
444 auditsys_entry_common
445 movl R9(%rsp),%r9d /* reload 6th syscall arg */
446 jmp cstar_dispatch
447
448sysretl_audit:
449 auditsys_exit sysretl_from_sys_call
450#endif
451
452cstar_tracesys:
453#ifdef CONFIG_AUDITSYSCALL
454 testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
455 jz cstar_auditsys
456#endif
457 xchgl %r9d,%ebp
458 SAVE_EXTRA_REGS
459 CLEAR_RREGS r9
460 movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
461 movq %rsp,%rdi /* &pt_regs -> arg1 */
462 call syscall_trace_enter
463 LOAD_ARGS32 1 /* reload args from stack in case ptrace changed it */
464 RESTORE_EXTRA_REGS
465 xchgl %ebp,%r9d
466 cmpq $(IA32_NR_syscalls-1),%rax
467 ja int_ret_from_sys_call /* cstar_tracesys has set RAX(%rsp) */
468 jmp cstar_do_call
469END(ia32_cstar_target)
470
471ia32_badarg:
472 ASM_CLAC
473 movq $-EFAULT,%rax
474 jmp ia32_sysret
475 CFI_ENDPROC
476
477/*
478 * Emulated IA32 system calls via int 0x80.
479 *
480 * Arguments:
481 * eax system call number
482 * ebx arg1
483 * ecx arg2
484 * edx arg3
485 * esi arg4
486 * edi arg5
487 * ebp arg6 (note: not saved in the stack frame, should not be touched)
488 *
489 * Notes:
490 * Uses the same stack frame as the x86-64 version.
491 * All registers except eax must be saved (but ptrace may violate that).
492 * Arguments are zero extended. For system calls that want sign extension and
493 * take long arguments a wrapper is needed. Most calls can just be called
494 * directly.
495 * Assumes it is only called from user space and entered with interrupts off.
496 */
497
498ENTRY(ia32_syscall)
499 CFI_STARTPROC32 simple
500 CFI_SIGNAL_FRAME
501 CFI_DEF_CFA rsp,5*8
502 /*CFI_REL_OFFSET ss,4*8 */
503 CFI_REL_OFFSET rsp,3*8
504 /*CFI_REL_OFFSET rflags,2*8 */
505 /*CFI_REL_OFFSET cs,1*8 */
506 CFI_REL_OFFSET rip,0*8
507
508 /*
509 * Interrupts are off on entry.
510 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
511 * it is too small to ever cause noticeable irq latency.
512 */
513 PARAVIRT_ADJUST_EXCEPTION_FRAME
514 SWAPGS
515 ENABLE_INTERRUPTS(CLBR_NONE)
516
517 /* Zero-extending 32-bit regs, do not remove */
518 movl %eax,%eax
519
520 /* Construct struct pt_regs on stack (iret frame is already on stack) */
521 pushq_cfi_reg rax /* pt_regs->orig_ax */
522 pushq_cfi_reg rdi /* pt_regs->di */
523 pushq_cfi_reg rsi /* pt_regs->si */
524 pushq_cfi_reg rdx /* pt_regs->dx */
525 pushq_cfi_reg rcx /* pt_regs->cx */
526 pushq_cfi_reg rax /* pt_regs->ax */
527 cld
528 sub $(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
529 CFI_ADJUST_CFA_OFFSET 10*8
530
531 orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
532 testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
533 jnz ia32_tracesys
534 cmpq $(IA32_NR_syscalls-1),%rax
535 ja ia32_badsys
536ia32_do_call:
537 /* 32bit syscall -> 64bit C ABI argument conversion */
538 movl %edi,%r8d /* arg5 */
539 movl %ebp,%r9d /* arg6 */
540 xchg %ecx,%esi /* rsi:arg2, rcx:arg4 */
541 movl %ebx,%edi /* arg1 */
542 movl %edx,%edx /* arg3 (zero extension) */
543 call *ia32_sys_call_table(,%rax,8) # xxx: rip relative
544ia32_sysret:
545 movq %rax,RAX(%rsp)
546ia32_ret_from_sys_call:
547 CLEAR_RREGS
548 jmp int_ret_from_sys_call
549
550ia32_tracesys:
551 SAVE_EXTRA_REGS
552 CLEAR_RREGS
553 movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
554 movq %rsp,%rdi /* &pt_regs -> arg1 */
555 call syscall_trace_enter
556 LOAD_ARGS32 /* reload args from stack in case ptrace changed it */
557 RESTORE_EXTRA_REGS
558 cmpq $(IA32_NR_syscalls-1),%rax
559 ja int_ret_from_sys_call /* ia32_tracesys has set RAX(%rsp) */
560 jmp ia32_do_call
561END(ia32_syscall)
562
563ia32_badsys:
564 movq $0,ORIG_RAX(%rsp)
565 movq $-ENOSYS,%rax
566 jmp ia32_sysret
567
568 CFI_ENDPROC
569
570 .macro PTREGSCALL label, func
571 ALIGN
572GLOBAL(\label)
573 leaq \func(%rip),%rax
574 jmp ia32_ptregs_common
575 .endm
576
577 CFI_STARTPROC32
578
579 PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
580 PTREGSCALL stub32_sigreturn, sys32_sigreturn
581 PTREGSCALL stub32_fork, sys_fork
582 PTREGSCALL stub32_vfork, sys_vfork
583
584 ALIGN
585GLOBAL(stub32_clone)
586 leaq sys_clone(%rip),%rax
587 mov %r8, %rcx
588 jmp ia32_ptregs_common
589
590 ALIGN
591ia32_ptregs_common:
592 CFI_ENDPROC
593 CFI_STARTPROC32 simple
594 CFI_SIGNAL_FRAME
595 CFI_DEF_CFA rsp,SIZEOF_PTREGS
596 CFI_REL_OFFSET rax,RAX
597 CFI_REL_OFFSET rcx,RCX
598 CFI_REL_OFFSET rdx,RDX
599 CFI_REL_OFFSET rsi,RSI
600 CFI_REL_OFFSET rdi,RDI
601 CFI_REL_OFFSET rip,RIP
602/* CFI_REL_OFFSET cs,CS*/
603/* CFI_REL_OFFSET rflags,EFLAGS*/
604 CFI_REL_OFFSET rsp,RSP
605/* CFI_REL_OFFSET ss,SS*/
606 SAVE_EXTRA_REGS 8
607 call *%rax
608 RESTORE_EXTRA_REGS 8
609 ret
610 CFI_ENDPROC
611END(ia32_ptregs_common)
diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index bdf02eeee765..e7636bac7372 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -18,6 +18,12 @@
18 .endm 18 .endm
19#endif 19#endif
20 20
21/*
22 * Issue one struct alt_instr descriptor entry (need to put it into
23 * the section .altinstructions, see below). This entry contains
24 * enough information for the alternatives patching code to patch an
25 * instruction. See apply_alternatives().
26 */
21.macro altinstruction_entry orig alt feature orig_len alt_len pad_len 27.macro altinstruction_entry orig alt feature orig_len alt_len pad_len
22 .long \orig - . 28 .long \orig - .
23 .long \alt - . 29 .long \alt - .
@@ -27,6 +33,12 @@
27 .byte \pad_len 33 .byte \pad_len
28.endm 34.endm
29 35
36/*
37 * Define an alternative between two instructions. If @feature is
38 * present, early code in apply_alternatives() replaces @oldinstr with
39 * @newinstr. ".skip" directive takes care of proper instruction padding
40 * in case @newinstr is longer than @oldinstr.
41 */
30.macro ALTERNATIVE oldinstr, newinstr, feature 42.macro ALTERNATIVE oldinstr, newinstr, feature
31140: 43140:
32 \oldinstr 44 \oldinstr
@@ -55,6 +67,12 @@
55 */ 67 */
56#define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b))))) 68#define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))
57 69
70
71/*
72 * Same as ALTERNATIVE macro above but for two alternatives. If CPU
73 * has @feature1, it replaces @oldinstr with @newinstr1. If CPU has
74 * @feature2, it replaces @oldinstr with @feature2.
75 */
58.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2 76.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2
59140: 77140:
60 \oldinstr 78 \oldinstr
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 976b86a325e5..c8393634ca0c 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -644,6 +644,12 @@ static inline void entering_ack_irq(void)
644 entering_irq(); 644 entering_irq();
645} 645}
646 646
647static inline void ipi_entering_ack_irq(void)
648{
649 ack_APIC_irq();
650 irq_enter();
651}
652
647static inline void exiting_irq(void) 653static inline void exiting_irq(void)
648{ 654{
649 irq_exit(); 655 irq_exit();
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 7730c1c5c83a..189679aba703 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -63,6 +63,31 @@
63 _ASM_ALIGN ; \ 63 _ASM_ALIGN ; \
64 _ASM_PTR (entry); \ 64 _ASM_PTR (entry); \
65 .popsection 65 .popsection
66
67.macro ALIGN_DESTINATION
68 /* check for bad alignment of destination */
69 movl %edi,%ecx
70 andl $7,%ecx
71 jz 102f /* already aligned */
72 subl $8,%ecx
73 negl %ecx
74 subl %ecx,%edx
75100: movb (%rsi),%al
76101: movb %al,(%rdi)
77 incq %rsi
78 incq %rdi
79 decl %ecx
80 jnz 100b
81102:
82 .section .fixup,"ax"
83103: addl %ecx,%edx /* ecx is zerorest also */
84 jmp copy_user_handle_tail
85 .previous
86
87 _ASM_EXTABLE(100b,103b)
88 _ASM_EXTABLE(101b,103b)
89 .endm
90
66#else 91#else
67# define _ASM_EXTABLE(from,to) \ 92# define _ASM_EXTABLE(from,to) \
68 " .pushsection \"__ex_table\",\"a\"\n" \ 93 " .pushsection \"__ex_table\",\"a\"\n" \
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 5e5cd123fdfb..e9168955c42f 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -22,7 +22,7 @@
22 * 22 *
23 * Atomically reads the value of @v. 23 * Atomically reads the value of @v.
24 */ 24 */
25static inline int atomic_read(const atomic_t *v) 25static __always_inline int atomic_read(const atomic_t *v)
26{ 26{
27 return ACCESS_ONCE((v)->counter); 27 return ACCESS_ONCE((v)->counter);
28} 28}
@@ -34,7 +34,7 @@ static inline int atomic_read(const atomic_t *v)
34 * 34 *
35 * Atomically sets the value of @v to @i. 35 * Atomically sets the value of @v to @i.
36 */ 36 */
37static inline void atomic_set(atomic_t *v, int i) 37static __always_inline void atomic_set(atomic_t *v, int i)
38{ 38{
39 v->counter = i; 39 v->counter = i;
40} 40}
@@ -46,7 +46,7 @@ static inline void atomic_set(atomic_t *v, int i)
46 * 46 *
47 * Atomically adds @i to @v. 47 * Atomically adds @i to @v.
48 */ 48 */
49static inline void atomic_add(int i, atomic_t *v) 49static __always_inline void atomic_add(int i, atomic_t *v)
50{ 50{
51 asm volatile(LOCK_PREFIX "addl %1,%0" 51 asm volatile(LOCK_PREFIX "addl %1,%0"
52 : "+m" (v->counter) 52 : "+m" (v->counter)
@@ -60,7 +60,7 @@ static inline void atomic_add(int i, atomic_t *v)
60 * 60 *
61 * Atomically subtracts @i from @v. 61 * Atomically subtracts @i from @v.
62 */ 62 */
63static inline void atomic_sub(int i, atomic_t *v) 63static __always_inline void atomic_sub(int i, atomic_t *v)
64{ 64{
65 asm volatile(LOCK_PREFIX "subl %1,%0" 65 asm volatile(LOCK_PREFIX "subl %1,%0"
66 : "+m" (v->counter) 66 : "+m" (v->counter)
@@ -76,7 +76,7 @@ static inline void atomic_sub(int i, atomic_t *v)
76 * true if the result is zero, or false for all 76 * true if the result is zero, or false for all
77 * other cases. 77 * other cases.
78 */ 78 */
79static inline int atomic_sub_and_test(int i, atomic_t *v) 79static __always_inline int atomic_sub_and_test(int i, atomic_t *v)
80{ 80{
81 GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", "e"); 81 GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", "e");
82} 82}
@@ -87,7 +87,7 @@ static inline int atomic_sub_and_test(int i, atomic_t *v)
87 * 87 *
88 * Atomically increments @v by 1. 88 * Atomically increments @v by 1.
89 */ 89 */
90static inline void atomic_inc(atomic_t *v) 90static __always_inline void atomic_inc(atomic_t *v)
91{ 91{
92 asm volatile(LOCK_PREFIX "incl %0" 92 asm volatile(LOCK_PREFIX "incl %0"
93 : "+m" (v->counter)); 93 : "+m" (v->counter));
@@ -99,7 +99,7 @@ static inline void atomic_inc(atomic_t *v)
99 * 99 *
100 * Atomically decrements @v by 1. 100 * Atomically decrements @v by 1.
101 */ 101 */
102static inline void atomic_dec(atomic_t *v) 102static __always_inline void atomic_dec(atomic_t *v)
103{ 103{
104 asm volatile(LOCK_PREFIX "decl %0" 104 asm volatile(LOCK_PREFIX "decl %0"
105 : "+m" (v->counter)); 105 : "+m" (v->counter));
@@ -113,7 +113,7 @@ static inline void atomic_dec(atomic_t *v)
113 * returns true if the result is 0, or false for all other 113 * returns true if the result is 0, or false for all other
114 * cases. 114 * cases.
115 */ 115 */
116static inline int atomic_dec_and_test(atomic_t *v) 116static __always_inline int atomic_dec_and_test(atomic_t *v)
117{ 117{
118 GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e"); 118 GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e");
119} 119}
@@ -126,7 +126,7 @@ static inline int atomic_dec_and_test(atomic_t *v)
126 * and returns true if the result is zero, or false for all 126 * and returns true if the result is zero, or false for all
127 * other cases. 127 * other cases.
128 */ 128 */
129static inline int atomic_inc_and_test(atomic_t *v) 129static __always_inline int atomic_inc_and_test(atomic_t *v)
130{ 130{
131 GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e"); 131 GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e");
132} 132}
@@ -140,7 +140,7 @@ static inline int atomic_inc_and_test(atomic_t *v)
140 * if the result is negative, or false when 140 * if the result is negative, or false when
141 * result is greater than or equal to zero. 141 * result is greater than or equal to zero.
142 */ 142 */
143static inline int atomic_add_negative(int i, atomic_t *v) 143static __always_inline int atomic_add_negative(int i, atomic_t *v)
144{ 144{
145 GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", "s"); 145 GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", "s");
146} 146}
@@ -152,7 +152,7 @@ static inline int atomic_add_negative(int i, atomic_t *v)
152 * 152 *
153 * Atomically adds @i to @v and returns @i + @v 153 * Atomically adds @i to @v and returns @i + @v
154 */ 154 */
155static inline int atomic_add_return(int i, atomic_t *v) 155static __always_inline int atomic_add_return(int i, atomic_t *v)
156{ 156{
157 return i + xadd(&v->counter, i); 157 return i + xadd(&v->counter, i);
158} 158}
@@ -164,7 +164,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
164 * 164 *
165 * Atomically subtracts @i from @v and returns @v - @i 165 * Atomically subtracts @i from @v and returns @v - @i
166 */ 166 */
167static inline int atomic_sub_return(int i, atomic_t *v) 167static __always_inline int atomic_sub_return(int i, atomic_t *v)
168{ 168{
169 return atomic_add_return(-i, v); 169 return atomic_add_return(-i, v);
170} 170}
@@ -172,7 +172,7 @@ static inline int atomic_sub_return(int i, atomic_t *v)
172#define atomic_inc_return(v) (atomic_add_return(1, v)) 172#define atomic_inc_return(v) (atomic_add_return(1, v))
173#define atomic_dec_return(v) (atomic_sub_return(1, v)) 173#define atomic_dec_return(v) (atomic_sub_return(1, v))
174 174
175static inline int atomic_cmpxchg(atomic_t *v, int old, int new) 175static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
176{ 176{
177 return cmpxchg(&v->counter, old, new); 177 return cmpxchg(&v->counter, old, new);
178} 178}
@@ -191,7 +191,7 @@ static inline int atomic_xchg(atomic_t *v, int new)
191 * Atomically adds @a to @v, so long as @v was not already @u. 191 * Atomically adds @a to @v, so long as @v was not already @u.
192 * Returns the old value of @v. 192 * Returns the old value of @v.
193 */ 193 */
194static inline int __atomic_add_unless(atomic_t *v, int a, int u) 194static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
195{ 195{
196 int c, old; 196 int c, old;
197 c = atomic_read(v); 197 c = atomic_read(v);
@@ -213,7 +213,7 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
213 * Atomically adds 1 to @v 213 * Atomically adds 1 to @v
214 * Returns the new value of @u 214 * Returns the new value of @u
215 */ 215 */
216static inline short int atomic_inc_short(short int *v) 216static __always_inline short int atomic_inc_short(short int *v)
217{ 217{
218 asm(LOCK_PREFIX "addw $1, %0" : "+m" (*v)); 218 asm(LOCK_PREFIX "addw $1, %0" : "+m" (*v));
219 return *v; 219 return *v;
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index f8d273e18516..b965f9e03f2a 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -40,7 +40,7 @@ static inline void atomic64_set(atomic64_t *v, long i)
40 * 40 *
41 * Atomically adds @i to @v. 41 * Atomically adds @i to @v.
42 */ 42 */
43static inline void atomic64_add(long i, atomic64_t *v) 43static __always_inline void atomic64_add(long i, atomic64_t *v)
44{ 44{
45 asm volatile(LOCK_PREFIX "addq %1,%0" 45 asm volatile(LOCK_PREFIX "addq %1,%0"
46 : "=m" (v->counter) 46 : "=m" (v->counter)
@@ -81,7 +81,7 @@ static inline int atomic64_sub_and_test(long i, atomic64_t *v)
81 * 81 *
82 * Atomically increments @v by 1. 82 * Atomically increments @v by 1.
83 */ 83 */
84static inline void atomic64_inc(atomic64_t *v) 84static __always_inline void atomic64_inc(atomic64_t *v)
85{ 85{
86 asm volatile(LOCK_PREFIX "incq %0" 86 asm volatile(LOCK_PREFIX "incq %0"
87 : "=m" (v->counter) 87 : "=m" (v->counter)
@@ -94,7 +94,7 @@ static inline void atomic64_inc(atomic64_t *v)
94 * 94 *
95 * Atomically decrements @v by 1. 95 * Atomically decrements @v by 1.
96 */ 96 */
97static inline void atomic64_dec(atomic64_t *v) 97static __always_inline void atomic64_dec(atomic64_t *v)
98{ 98{
99 asm volatile(LOCK_PREFIX "decq %0" 99 asm volatile(LOCK_PREFIX "decq %0"
100 : "=m" (v->counter) 100 : "=m" (v->counter)
@@ -148,7 +148,7 @@ static inline int atomic64_add_negative(long i, atomic64_t *v)
148 * 148 *
149 * Atomically adds @i to @v and returns @i + @v 149 * Atomically adds @i to @v and returns @i + @v
150 */ 150 */
151static inline long atomic64_add_return(long i, atomic64_t *v) 151static __always_inline long atomic64_add_return(long i, atomic64_t *v)
152{ 152{
153 return i + xadd(&v->counter, i); 153 return i + xadd(&v->counter, i);
154} 154}
diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 47c8e32f621a..b6f7457d12e4 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -8,7 +8,7 @@
8/* 8/*
9 * The set_memory_* API can be used to change various attributes of a virtual 9 * The set_memory_* API can be used to change various attributes of a virtual
10 * address range. The attributes include: 10 * address range. The attributes include:
11 * Cachability : UnCached, WriteCombining, WriteBack 11 * Cachability : UnCached, WriteCombining, WriteThrough, WriteBack
12 * Executability : eXeutable, NoteXecutable 12 * Executability : eXeutable, NoteXecutable
13 * Read/Write : ReadOnly, ReadWrite 13 * Read/Write : ReadOnly, ReadWrite
14 * Presence : NotPresent 14 * Presence : NotPresent
@@ -35,9 +35,11 @@
35 35
36int _set_memory_uc(unsigned long addr, int numpages); 36int _set_memory_uc(unsigned long addr, int numpages);
37int _set_memory_wc(unsigned long addr, int numpages); 37int _set_memory_wc(unsigned long addr, int numpages);
38int _set_memory_wt(unsigned long addr, int numpages);
38int _set_memory_wb(unsigned long addr, int numpages); 39int _set_memory_wb(unsigned long addr, int numpages);
39int set_memory_uc(unsigned long addr, int numpages); 40int set_memory_uc(unsigned long addr, int numpages);
40int set_memory_wc(unsigned long addr, int numpages); 41int set_memory_wc(unsigned long addr, int numpages);
42int set_memory_wt(unsigned long addr, int numpages);
41int set_memory_wb(unsigned long addr, int numpages); 43int set_memory_wb(unsigned long addr, int numpages);
42int set_memory_x(unsigned long addr, int numpages); 44int set_memory_x(unsigned long addr, int numpages);
43int set_memory_nx(unsigned long addr, int numpages); 45int set_memory_nx(unsigned long addr, int numpages);
@@ -48,10 +50,12 @@ int set_memory_4k(unsigned long addr, int numpages);
48 50
49int set_memory_array_uc(unsigned long *addr, int addrinarray); 51int set_memory_array_uc(unsigned long *addr, int addrinarray);
50int set_memory_array_wc(unsigned long *addr, int addrinarray); 52int set_memory_array_wc(unsigned long *addr, int addrinarray);
53int set_memory_array_wt(unsigned long *addr, int addrinarray);
51int set_memory_array_wb(unsigned long *addr, int addrinarray); 54int set_memory_array_wb(unsigned long *addr, int addrinarray);
52 55
53int set_pages_array_uc(struct page **pages, int addrinarray); 56int set_pages_array_uc(struct page **pages, int addrinarray);
54int set_pages_array_wc(struct page **pages, int addrinarray); 57int set_pages_array_wc(struct page **pages, int addrinarray);
58int set_pages_array_wt(struct page **pages, int addrinarray);
55int set_pages_array_wb(struct page **pages, int addrinarray); 59int set_pages_array_wb(struct page **pages, int addrinarray);
56 60
57/* 61/*
diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h
deleted file mode 100644
index de1cdaf4d743..000000000000
--- a/arch/x86/include/asm/dwarf2.h
+++ /dev/null
@@ -1,170 +0,0 @@
1#ifndef _ASM_X86_DWARF2_H
2#define _ASM_X86_DWARF2_H
3
4#ifndef __ASSEMBLY__
5#warning "asm/dwarf2.h should be only included in pure assembly files"
6#endif
7
8/*
9 * Macros for dwarf2 CFI unwind table entries.
10 * See "as.info" for details on these pseudo ops. Unfortunately
11 * they are only supported in very new binutils, so define them
12 * away for older version.
13 */
14
15#ifdef CONFIG_AS_CFI
16
17#define CFI_STARTPROC .cfi_startproc
18#define CFI_ENDPROC .cfi_endproc
19#define CFI_DEF_CFA .cfi_def_cfa
20#define CFI_DEF_CFA_REGISTER .cfi_def_cfa_register
21#define CFI_DEF_CFA_OFFSET .cfi_def_cfa_offset
22#define CFI_ADJUST_CFA_OFFSET .cfi_adjust_cfa_offset
23#define CFI_OFFSET .cfi_offset
24#define CFI_REL_OFFSET .cfi_rel_offset
25#define CFI_REGISTER .cfi_register
26#define CFI_RESTORE .cfi_restore
27#define CFI_REMEMBER_STATE .cfi_remember_state
28#define CFI_RESTORE_STATE .cfi_restore_state
29#define CFI_UNDEFINED .cfi_undefined
30#define CFI_ESCAPE .cfi_escape
31
32#ifdef CONFIG_AS_CFI_SIGNAL_FRAME
33#define CFI_SIGNAL_FRAME .cfi_signal_frame
34#else
35#define CFI_SIGNAL_FRAME
36#endif
37
38#if defined(CONFIG_AS_CFI_SECTIONS) && defined(__ASSEMBLY__)
39 /*
40 * Emit CFI data in .debug_frame sections, not .eh_frame sections.
41 * The latter we currently just discard since we don't do DWARF
42 * unwinding at runtime. So only the offline DWARF information is
43 * useful to anyone. Note we should not use this directive if this
44 * file is used in the vDSO assembly, or if vmlinux.lds.S gets
45 * changed so it doesn't discard .eh_frame.
46 */
47 .cfi_sections .debug_frame
48#endif
49
50#else
51
52/*
53 * Due to the structure of pre-exisiting code, don't use assembler line
54 * comment character # to ignore the arguments. Instead, use a dummy macro.
55 */
56.macro cfi_ignore a=0, b=0, c=0, d=0
57.endm
58
59#define CFI_STARTPROC cfi_ignore
60#define CFI_ENDPROC cfi_ignore
61#define CFI_DEF_CFA cfi_ignore
62#define CFI_DEF_CFA_REGISTER cfi_ignore
63#define CFI_DEF_CFA_OFFSET cfi_ignore
64#define CFI_ADJUST_CFA_OFFSET cfi_ignore
65#define CFI_OFFSET cfi_ignore
66#define CFI_REL_OFFSET cfi_ignore
67#define CFI_REGISTER cfi_ignore
68#define CFI_RESTORE cfi_ignore
69#define CFI_REMEMBER_STATE cfi_ignore
70#define CFI_RESTORE_STATE cfi_ignore
71#define CFI_UNDEFINED cfi_ignore
72#define CFI_ESCAPE cfi_ignore
73#define CFI_SIGNAL_FRAME cfi_ignore
74
75#endif
76
77/*
78 * An attempt to make CFI annotations more or less
79 * correct and shorter. It is implied that you know
80 * what you're doing if you use them.
81 */
82#ifdef __ASSEMBLY__
83#ifdef CONFIG_X86_64
84 .macro pushq_cfi reg
85 pushq \reg
86 CFI_ADJUST_CFA_OFFSET 8
87 .endm
88
89 .macro pushq_cfi_reg reg
90 pushq %\reg
91 CFI_ADJUST_CFA_OFFSET 8
92 CFI_REL_OFFSET \reg, 0
93 .endm
94
95 .macro popq_cfi reg
96 popq \reg
97 CFI_ADJUST_CFA_OFFSET -8
98 .endm
99
100 .macro popq_cfi_reg reg
101 popq %\reg
102 CFI_ADJUST_CFA_OFFSET -8
103 CFI_RESTORE \reg
104 .endm
105
106 .macro pushfq_cfi
107 pushfq
108 CFI_ADJUST_CFA_OFFSET 8
109 .endm
110
111 .macro popfq_cfi
112 popfq
113 CFI_ADJUST_CFA_OFFSET -8
114 .endm
115
116 .macro movq_cfi reg offset=0
117 movq %\reg, \offset(%rsp)
118 CFI_REL_OFFSET \reg, \offset
119 .endm
120
121 .macro movq_cfi_restore offset reg
122 movq \offset(%rsp), %\reg
123 CFI_RESTORE \reg
124 .endm
125#else /*!CONFIG_X86_64*/
126 .macro pushl_cfi reg
127 pushl \reg
128 CFI_ADJUST_CFA_OFFSET 4
129 .endm
130
131 .macro pushl_cfi_reg reg
132 pushl %\reg
133 CFI_ADJUST_CFA_OFFSET 4
134 CFI_REL_OFFSET \reg, 0
135 .endm
136
137 .macro popl_cfi reg
138 popl \reg
139 CFI_ADJUST_CFA_OFFSET -4
140 .endm
141
142 .macro popl_cfi_reg reg
143 popl %\reg
144 CFI_ADJUST_CFA_OFFSET -4
145 CFI_RESTORE \reg
146 .endm
147
148 .macro pushfl_cfi
149 pushfl
150 CFI_ADJUST_CFA_OFFSET 4
151 .endm
152
153 .macro popfl_cfi
154 popfl
155 CFI_ADJUST_CFA_OFFSET -4
156 .endm
157
158 .macro movl_cfi reg offset=0
159 movl %\reg, \offset(%esp)
160 CFI_REL_OFFSET \reg, \offset
161 .endm
162
163 .macro movl_cfi_restore offset reg
164 movl \offset(%esp), %\reg
165 CFI_RESTORE \reg
166 .endm
167#endif /*!CONFIG_X86_64*/
168#endif /*__ASSEMBLY__*/
169
170#endif /* _ASM_X86_DWARF2_H */
diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index dc5fa661465f..df002992d8fd 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -23,6 +23,8 @@ BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
23#ifdef CONFIG_HAVE_KVM 23#ifdef CONFIG_HAVE_KVM
24BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR, 24BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR,
25 smp_kvm_posted_intr_ipi) 25 smp_kvm_posted_intr_ipi)
26BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR,
27 smp_kvm_posted_intr_wakeup_ipi)
26#endif 28#endif
27 29
28/* 30/*
@@ -50,4 +52,7 @@ BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
50BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR) 52BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
51#endif 53#endif
52 54
55#ifdef CONFIG_X86_MCE_AMD
56BUILD_INTERRUPT(deferred_error_interrupt, DEFERRED_ERROR_VECTOR)
57#endif
53#endif 58#endif
diff --git a/arch/x86/include/asm/frame.h b/arch/x86/include/asm/frame.h
index 3b629f47eb65..793179cf8e21 100644
--- a/arch/x86/include/asm/frame.h
+++ b/arch/x86/include/asm/frame.h
@@ -1,20 +1,17 @@
1#ifdef __ASSEMBLY__ 1#ifdef __ASSEMBLY__
2 2
3#include <asm/asm.h> 3#include <asm/asm.h>
4#include <asm/dwarf2.h>
5 4
6/* The annotation hides the frame from the unwinder and makes it look 5/* The annotation hides the frame from the unwinder and makes it look
7 like a ordinary ebp save/restore. This avoids some special cases for 6 like a ordinary ebp save/restore. This avoids some special cases for
8 frame pointer later */ 7 frame pointer later */
9#ifdef CONFIG_FRAME_POINTER 8#ifdef CONFIG_FRAME_POINTER
10 .macro FRAME 9 .macro FRAME
11 __ASM_SIZE(push,_cfi) %__ASM_REG(bp) 10 __ASM_SIZE(push,) %__ASM_REG(bp)
12 CFI_REL_OFFSET __ASM_REG(bp), 0
13 __ASM_SIZE(mov) %__ASM_REG(sp), %__ASM_REG(bp) 11 __ASM_SIZE(mov) %__ASM_REG(sp), %__ASM_REG(bp)
14 .endm 12 .endm
15 .macro ENDFRAME 13 .macro ENDFRAME
16 __ASM_SIZE(pop,_cfi) %__ASM_REG(bp) 14 __ASM_SIZE(pop,) %__ASM_REG(bp)
17 CFI_RESTORE __ASM_REG(bp)
18 .endm 15 .endm
19#else 16#else
20 .macro FRAME 17 .macro FRAME
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 0f5fb6b6567e..7178043b0e1d 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -14,6 +14,7 @@ typedef struct {
14#endif 14#endif
15#ifdef CONFIG_HAVE_KVM 15#ifdef CONFIG_HAVE_KVM
16 unsigned int kvm_posted_intr_ipis; 16 unsigned int kvm_posted_intr_ipis;
17 unsigned int kvm_posted_intr_wakeup_ipis;
17#endif 18#endif
18 unsigned int x86_platform_ipis; /* arch dependent */ 19 unsigned int x86_platform_ipis; /* arch dependent */
19 unsigned int apic_perf_irqs; 20 unsigned int apic_perf_irqs;
@@ -33,6 +34,9 @@ typedef struct {
33#ifdef CONFIG_X86_MCE_THRESHOLD 34#ifdef CONFIG_X86_MCE_THRESHOLD
34 unsigned int irq_threshold_count; 35 unsigned int irq_threshold_count;
35#endif 36#endif
37#ifdef CONFIG_X86_MCE_AMD
38 unsigned int irq_deferred_error_count;
39#endif
36#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN) 40#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
37 unsigned int irq_hv_callback_count; 41 unsigned int irq_hv_callback_count;
38#endif 42#endif
diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 36f7125945e3..5fa9fb0f8809 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -74,20 +74,16 @@ extern unsigned int hpet_readl(unsigned int a);
74extern void force_hpet_resume(void); 74extern void force_hpet_resume(void);
75 75
76struct irq_data; 76struct irq_data;
77struct hpet_dev;
78struct irq_domain;
79
77extern void hpet_msi_unmask(struct irq_data *data); 80extern void hpet_msi_unmask(struct irq_data *data);
78extern void hpet_msi_mask(struct irq_data *data); 81extern void hpet_msi_mask(struct irq_data *data);
79struct hpet_dev;
80extern void hpet_msi_write(struct hpet_dev *hdev, struct msi_msg *msg); 82extern void hpet_msi_write(struct hpet_dev *hdev, struct msi_msg *msg);
81extern void hpet_msi_read(struct hpet_dev *hdev, struct msi_msg *msg); 83extern void hpet_msi_read(struct hpet_dev *hdev, struct msi_msg *msg);
82 84extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
83#ifdef CONFIG_PCI_MSI 85extern int hpet_assign_irq(struct irq_domain *domain,
84extern int default_setup_hpet_msi(unsigned int irq, unsigned int id); 86 struct hpet_dev *dev, int dev_num);
85#else
86static inline int default_setup_hpet_msi(unsigned int irq, unsigned int id)
87{
88 return -EINVAL;
89}
90#endif
91 87
92#ifdef CONFIG_HPET_EMULATE_RTC 88#ifdef CONFIG_HPET_EMULATE_RTC
93 89
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e9571ddabc4f..6615032e19c8 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -29,6 +29,7 @@
29extern asmlinkage void apic_timer_interrupt(void); 29extern asmlinkage void apic_timer_interrupt(void);
30extern asmlinkage void x86_platform_ipi(void); 30extern asmlinkage void x86_platform_ipi(void);
31extern asmlinkage void kvm_posted_intr_ipi(void); 31extern asmlinkage void kvm_posted_intr_ipi(void);
32extern asmlinkage void kvm_posted_intr_wakeup_ipi(void);
32extern asmlinkage void error_interrupt(void); 33extern asmlinkage void error_interrupt(void);
33extern asmlinkage void irq_work_interrupt(void); 34extern asmlinkage void irq_work_interrupt(void);
34 35
@@ -36,43 +37,10 @@ extern asmlinkage void spurious_interrupt(void);
36extern asmlinkage void thermal_interrupt(void); 37extern asmlinkage void thermal_interrupt(void);
37extern asmlinkage void reschedule_interrupt(void); 38extern asmlinkage void reschedule_interrupt(void);
38 39
39extern asmlinkage void invalidate_interrupt(void);
40extern asmlinkage void invalidate_interrupt0(void);
41extern asmlinkage void invalidate_interrupt1(void);
42extern asmlinkage void invalidate_interrupt2(void);
43extern asmlinkage void invalidate_interrupt3(void);
44extern asmlinkage void invalidate_interrupt4(void);
45extern asmlinkage void invalidate_interrupt5(void);
46extern asmlinkage void invalidate_interrupt6(void);
47extern asmlinkage void invalidate_interrupt7(void);
48extern asmlinkage void invalidate_interrupt8(void);
49extern asmlinkage void invalidate_interrupt9(void);
50extern asmlinkage void invalidate_interrupt10(void);
51extern asmlinkage void invalidate_interrupt11(void);
52extern asmlinkage void invalidate_interrupt12(void);
53extern asmlinkage void invalidate_interrupt13(void);
54extern asmlinkage void invalidate_interrupt14(void);
55extern asmlinkage void invalidate_interrupt15(void);
56extern asmlinkage void invalidate_interrupt16(void);
57extern asmlinkage void invalidate_interrupt17(void);
58extern asmlinkage void invalidate_interrupt18(void);
59extern asmlinkage void invalidate_interrupt19(void);
60extern asmlinkage void invalidate_interrupt20(void);
61extern asmlinkage void invalidate_interrupt21(void);
62extern asmlinkage void invalidate_interrupt22(void);
63extern asmlinkage void invalidate_interrupt23(void);
64extern asmlinkage void invalidate_interrupt24(void);
65extern asmlinkage void invalidate_interrupt25(void);
66extern asmlinkage void invalidate_interrupt26(void);
67extern asmlinkage void invalidate_interrupt27(void);
68extern asmlinkage void invalidate_interrupt28(void);
69extern asmlinkage void invalidate_interrupt29(void);
70extern asmlinkage void invalidate_interrupt30(void);
71extern asmlinkage void invalidate_interrupt31(void);
72
73extern asmlinkage void irq_move_cleanup_interrupt(void); 40extern asmlinkage void irq_move_cleanup_interrupt(void);
74extern asmlinkage void reboot_interrupt(void); 41extern asmlinkage void reboot_interrupt(void);
75extern asmlinkage void threshold_interrupt(void); 42extern asmlinkage void threshold_interrupt(void);
43extern asmlinkage void deferred_error_interrupt(void);
76 44
77extern asmlinkage void call_function_interrupt(void); 45extern asmlinkage void call_function_interrupt(void);
78extern asmlinkage void call_function_single_interrupt(void); 46extern asmlinkage void call_function_single_interrupt(void);
@@ -87,60 +55,93 @@ extern void trace_spurious_interrupt(void);
87extern void trace_thermal_interrupt(void); 55extern void trace_thermal_interrupt(void);
88extern void trace_reschedule_interrupt(void); 56extern void trace_reschedule_interrupt(void);
89extern void trace_threshold_interrupt(void); 57extern void trace_threshold_interrupt(void);
58extern void trace_deferred_error_interrupt(void);
90extern void trace_call_function_interrupt(void); 59extern void trace_call_function_interrupt(void);
91extern void trace_call_function_single_interrupt(void); 60extern void trace_call_function_single_interrupt(void);
92#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt 61#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt
93#define trace_reboot_interrupt reboot_interrupt 62#define trace_reboot_interrupt reboot_interrupt
94#define trace_kvm_posted_intr_ipi kvm_posted_intr_ipi 63#define trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
64#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
95#endif /* CONFIG_TRACING */ 65#endif /* CONFIG_TRACING */
96 66
97#ifdef CONFIG_IRQ_REMAP
98/* Intel specific interrupt remapping information */
99struct irq_2_iommu {
100 struct intel_iommu *iommu;
101 u16 irte_index;
102 u16 sub_handle;
103 u8 irte_mask;
104};
105
106/* AMD specific interrupt remapping information */
107struct irq_2_irte {
108 u16 devid; /* Device ID for IRTE table */
109 u16 index; /* Index into IRTE table*/
110};
111#endif /* CONFIG_IRQ_REMAP */
112
113#ifdef CONFIG_X86_LOCAL_APIC 67#ifdef CONFIG_X86_LOCAL_APIC
114struct irq_data; 68struct irq_data;
69struct pci_dev;
70struct msi_desc;
71
72enum irq_alloc_type {
73 X86_IRQ_ALLOC_TYPE_IOAPIC = 1,
74 X86_IRQ_ALLOC_TYPE_HPET,
75 X86_IRQ_ALLOC_TYPE_MSI,
76 X86_IRQ_ALLOC_TYPE_MSIX,
77 X86_IRQ_ALLOC_TYPE_DMAR,
78 X86_IRQ_ALLOC_TYPE_UV,
79};
115 80
116struct irq_cfg { 81struct irq_alloc_info {
117 cpumask_var_t domain; 82 enum irq_alloc_type type;
118 cpumask_var_t old_domain; 83 u32 flags;
119 u8 vector; 84 const struct cpumask *mask; /* CPU mask for vector allocation */
120 u8 move_in_progress : 1;
121#ifdef CONFIG_IRQ_REMAP
122 u8 remapped : 1;
123 union { 85 union {
124 struct irq_2_iommu irq_2_iommu; 86 int unused;
125 struct irq_2_irte irq_2_irte; 87#ifdef CONFIG_HPET_TIMER
126 }; 88 struct {
89 int hpet_id;
90 int hpet_index;
91 void *hpet_data;
92 };
127#endif 93#endif
128 union { 94#ifdef CONFIG_PCI_MSI
129#ifdef CONFIG_X86_IO_APIC
130 struct { 95 struct {
131 struct list_head irq_2_pin; 96 struct pci_dev *msi_dev;
97 irq_hw_number_t msi_hwirq;
98 };
99#endif
100#ifdef CONFIG_X86_IO_APIC
101 struct {
102 int ioapic_id;
103 int ioapic_pin;
104 int ioapic_node;
105 u32 ioapic_trigger : 1;
106 u32 ioapic_polarity : 1;
107 u32 ioapic_valid : 1;
108 struct IO_APIC_route_entry *ioapic_entry;
109 };
110#endif
111#ifdef CONFIG_DMAR_TABLE
112 struct {
113 int dmar_id;
114 void *dmar_data;
115 };
116#endif
117#ifdef CONFIG_HT_IRQ
118 struct {
119 int ht_pos;
120 int ht_idx;
121 struct pci_dev *ht_dev;
122 void *ht_update;
123 };
124#endif
125#ifdef CONFIG_X86_UV
126 struct {
127 int uv_limit;
128 int uv_blade;
129 unsigned long uv_offset;
130 char *uv_name;
132 }; 131 };
133#endif 132#endif
134 }; 133 };
135}; 134};
136 135
136struct irq_cfg {
137 unsigned int dest_apicid;
138 u8 vector;
139};
140
137extern struct irq_cfg *irq_cfg(unsigned int irq); 141extern struct irq_cfg *irq_cfg(unsigned int irq);
138extern struct irq_cfg *irqd_cfg(struct irq_data *irq_data); 142extern struct irq_cfg *irqd_cfg(struct irq_data *irq_data);
139extern struct irq_cfg *alloc_irq_and_cfg_at(unsigned int at, int node);
140extern void lock_vector_lock(void); 143extern void lock_vector_lock(void);
141extern void unlock_vector_lock(void); 144extern void unlock_vector_lock(void);
142extern int assign_irq_vector(int, struct irq_cfg *, const struct cpumask *);
143extern void clear_irq_vector(int irq, struct irq_cfg *cfg);
144extern void setup_vector_irq(int cpu); 145extern void setup_vector_irq(int cpu);
145#ifdef CONFIG_SMP 146#ifdef CONFIG_SMP
146extern void send_cleanup_vector(struct irq_cfg *); 147extern void send_cleanup_vector(struct irq_cfg *);
@@ -150,10 +151,7 @@ static inline void send_cleanup_vector(struct irq_cfg *c) { }
150static inline void irq_complete_move(struct irq_cfg *c) { } 151static inline void irq_complete_move(struct irq_cfg *c) { }
151#endif 152#endif
152 153
153extern int apic_retrigger_irq(struct irq_data *data);
154extern void apic_ack_edge(struct irq_data *data); 154extern void apic_ack_edge(struct irq_data *data);
155extern int apic_set_affinity(struct irq_data *data, const struct cpumask *mask,
156 unsigned int *dest_id);
157#else /* CONFIG_X86_LOCAL_APIC */ 155#else /* CONFIG_X86_LOCAL_APIC */
158static inline void lock_vector_lock(void) {} 156static inline void lock_vector_lock(void) {}
159static inline void unlock_vector_lock(void) {} 157static inline void unlock_vector_lock(void) {}
@@ -163,8 +161,7 @@ static inline void unlock_vector_lock(void) {}
163extern atomic_t irq_err_count; 161extern atomic_t irq_err_count;
164extern atomic_t irq_mis_count; 162extern atomic_t irq_mis_count;
165 163
166/* EISA */ 164extern void elcr_set_level_irq(unsigned int irq);
167extern void eisa_set_level_irq(unsigned int irq);
168 165
169/* SMP */ 166/* SMP */
170extern __visible void smp_apic_timer_interrupt(struct pt_regs *); 167extern __visible void smp_apic_timer_interrupt(struct pt_regs *);
@@ -178,7 +175,6 @@ extern asmlinkage void smp_irq_move_cleanup_interrupt(void);
178extern __visible void smp_reschedule_interrupt(struct pt_regs *); 175extern __visible void smp_reschedule_interrupt(struct pt_regs *);
179extern __visible void smp_call_function_interrupt(struct pt_regs *); 176extern __visible void smp_call_function_interrupt(struct pt_regs *);
180extern __visible void smp_call_function_single_interrupt(struct pt_regs *); 177extern __visible void smp_call_function_single_interrupt(struct pt_regs *);
181extern __visible void smp_invalidate_interrupt(struct pt_regs *);
182#endif 178#endif
183 179
184extern char irq_entries_start[]; 180extern char irq_entries_start[];
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93704d3..83ec9b1d77cc 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -35,11 +35,13 @@
35 */ 35 */
36 36
37#define ARCH_HAS_IOREMAP_WC 37#define ARCH_HAS_IOREMAP_WC
38#define ARCH_HAS_IOREMAP_WT
38 39
39#include <linux/string.h> 40#include <linux/string.h>
40#include <linux/compiler.h> 41#include <linux/compiler.h>
41#include <asm/page.h> 42#include <asm/page.h>
42#include <asm/early_ioremap.h> 43#include <asm/early_ioremap.h>
44#include <asm/pgtable_types.h>
43 45
44#define build_mmio_read(name, size, type, reg, barrier) \ 46#define build_mmio_read(name, size, type, reg, barrier) \
45static inline type name(const volatile void __iomem *addr) \ 47static inline type name(const volatile void __iomem *addr) \
@@ -177,6 +179,7 @@ static inline unsigned int isa_virt_to_bus(volatile void *address)
177 * look at pci_iomap(). 179 * look at pci_iomap().
178 */ 180 */
179extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size); 181extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
182extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size);
180extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); 183extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
181extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 184extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
182 unsigned long prot_val); 185 unsigned long prot_val);
@@ -197,8 +200,6 @@ extern void set_iounmap_nonlazy(void);
197 200
198#include <asm-generic/iomap.h> 201#include <asm-generic/iomap.h>
199 202
200#include <linux/vmalloc.h>
201
202/* 203/*
203 * Convert a virtual cached pointer to an uncached pointer 204 * Convert a virtual cached pointer to an uncached pointer
204 */ 205 */
@@ -320,6 +321,7 @@ extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
320extern int ioremap_change_attr(unsigned long vaddr, unsigned long size, 321extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
321 enum page_cache_mode pcm); 322 enum page_cache_mode pcm);
322extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size); 323extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
324extern void __iomem *ioremap_wt(resource_size_t offset, unsigned long size);
323 325
324extern bool is_early_ioremap_ptep(pte_t *ptep); 326extern bool is_early_ioremap_ptep(pte_t *ptep);
325 327
@@ -338,6 +340,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
338#define IO_SPACE_LIMIT 0xffff 340#define IO_SPACE_LIMIT 0xffff
339 341
340#ifdef CONFIG_MTRR 342#ifdef CONFIG_MTRR
343extern int __must_check arch_phys_wc_index(int handle);
344#define arch_phys_wc_index arch_phys_wc_index
345
341extern int __must_check arch_phys_wc_add(unsigned long base, 346extern int __must_check arch_phys_wc_add(unsigned long base,
342 unsigned long size); 347 unsigned long size);
343extern void arch_phys_wc_del(int handle); 348extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 2f91685fe1cd..6cbf2cfb3f8a 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -95,9 +95,22 @@ struct IR_IO_APIC_route_entry {
95 index : 15; 95 index : 15;
96} __attribute__ ((packed)); 96} __attribute__ ((packed));
97 97
98#define IOAPIC_AUTO -1 98struct irq_alloc_info;
99#define IOAPIC_EDGE 0 99struct ioapic_domain_cfg;
100#define IOAPIC_LEVEL 1 100
101#define IOAPIC_AUTO -1
102#define IOAPIC_EDGE 0
103#define IOAPIC_LEVEL 1
104
105#define IOAPIC_MASKED 1
106#define IOAPIC_UNMASKED 0
107
108#define IOAPIC_POL_HIGH 0
109#define IOAPIC_POL_LOW 1
110
111#define IOAPIC_DEST_MODE_PHYSICAL 0
112#define IOAPIC_DEST_MODE_LOGICAL 1
113
101#define IOAPIC_MAP_ALLOC 0x1 114#define IOAPIC_MAP_ALLOC 0x1
102#define IOAPIC_MAP_CHECK 0x2 115#define IOAPIC_MAP_CHECK 0x2
103 116
@@ -110,9 +123,6 @@ extern int nr_ioapics;
110 123
111extern int mpc_ioapic_id(int ioapic); 124extern int mpc_ioapic_id(int ioapic);
112extern unsigned int mpc_ioapic_addr(int ioapic); 125extern unsigned int mpc_ioapic_addr(int ioapic);
113extern struct mp_ioapic_gsi *mp_ioapic_gsi_routing(int ioapic);
114
115#define MP_MAX_IOAPIC_PIN 127
116 126
117/* # of MP IRQ source entries */ 127/* # of MP IRQ source entries */
118extern int mp_irq_entries; 128extern int mp_irq_entries;
@@ -120,9 +130,6 @@ extern int mp_irq_entries;
120/* MP IRQ source entries */ 130/* MP IRQ source entries */
121extern struct mpc_intsrc mp_irqs[MAX_IRQ_SOURCES]; 131extern struct mpc_intsrc mp_irqs[MAX_IRQ_SOURCES];
122 132
123/* Older SiS APIC requires we rewrite the index register */
124extern int sis_apic_bug;
125
126/* 1 if "noapic" boot option passed */ 133/* 1 if "noapic" boot option passed */
127extern int skip_ioapic_setup; 134extern int skip_ioapic_setup;
128 135
@@ -132,6 +139,8 @@ extern int noioapicquirk;
132/* -1 if "noapic" boot option passed */ 139/* -1 if "noapic" boot option passed */
133extern int noioapicreroute; 140extern int noioapicreroute;
134 141
142extern u32 gsi_top;
143
135extern unsigned long io_apic_irqs; 144extern unsigned long io_apic_irqs;
136 145
137#define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1 << (x)) & io_apic_irqs)) 146#define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1 << (x)) & io_apic_irqs))
@@ -147,13 +156,6 @@ struct irq_cfg;
147extern void ioapic_insert_resources(void); 156extern void ioapic_insert_resources(void);
148extern int arch_early_ioapic_init(void); 157extern int arch_early_ioapic_init(void);
149 158
150extern int native_setup_ioapic_entry(int, struct IO_APIC_route_entry *,
151 unsigned int, int,
152 struct io_apic_irq_attr *);
153extern void eoi_ioapic_irq(unsigned int irq, struct irq_cfg *cfg);
154
155extern void native_eoi_ioapic_pin(int apic, int pin, int vector);
156
157extern int save_ioapic_entries(void); 159extern int save_ioapic_entries(void);
158extern void mask_ioapic_entries(void); 160extern void mask_ioapic_entries(void);
159extern int restore_ioapic_entries(void); 161extern int restore_ioapic_entries(void);
@@ -161,82 +163,32 @@ extern int restore_ioapic_entries(void);
161extern void setup_ioapic_ids_from_mpc(void); 163extern void setup_ioapic_ids_from_mpc(void);
162extern void setup_ioapic_ids_from_mpc_nocheck(void); 164extern void setup_ioapic_ids_from_mpc_nocheck(void);
163 165
164struct io_apic_irq_attr {
165 int ioapic;
166 int ioapic_pin;
167 int trigger;
168 int polarity;
169};
170
171enum ioapic_domain_type {
172 IOAPIC_DOMAIN_INVALID,
173 IOAPIC_DOMAIN_LEGACY,
174 IOAPIC_DOMAIN_STRICT,
175 IOAPIC_DOMAIN_DYNAMIC,
176};
177
178struct device_node;
179struct irq_domain;
180struct irq_domain_ops;
181
182struct ioapic_domain_cfg {
183 enum ioapic_domain_type type;
184 const struct irq_domain_ops *ops;
185 struct device_node *dev;
186};
187
188struct mp_ioapic_gsi{
189 u32 gsi_base;
190 u32 gsi_end;
191};
192extern u32 gsi_top;
193
194extern int mp_find_ioapic(u32 gsi); 166extern int mp_find_ioapic(u32 gsi);
195extern int mp_find_ioapic_pin(int ioapic, u32 gsi); 167extern int mp_find_ioapic_pin(int ioapic, u32 gsi);
196extern u32 mp_pin_to_gsi(int ioapic, int pin); 168extern int mp_map_gsi_to_irq(u32 gsi, unsigned int flags,
197extern int mp_map_gsi_to_irq(u32 gsi, unsigned int flags); 169 struct irq_alloc_info *info);
198extern void mp_unmap_irq(int irq); 170extern void mp_unmap_irq(int irq);
199extern int mp_register_ioapic(int id, u32 address, u32 gsi_base, 171extern int mp_register_ioapic(int id, u32 address, u32 gsi_base,
200 struct ioapic_domain_cfg *cfg); 172 struct ioapic_domain_cfg *cfg);
201extern int mp_unregister_ioapic(u32 gsi_base); 173extern int mp_unregister_ioapic(u32 gsi_base);
202extern int mp_ioapic_registered(u32 gsi_base); 174extern int mp_ioapic_registered(u32 gsi_base);
203extern int mp_irqdomain_map(struct irq_domain *domain, unsigned int virq, 175
204 irq_hw_number_t hwirq); 176extern void ioapic_set_alloc_attr(struct irq_alloc_info *info,
205extern void mp_irqdomain_unmap(struct irq_domain *domain, unsigned int virq); 177 int node, int trigger, int polarity);
206extern int mp_set_gsi_attr(u32 gsi, int trigger, int polarity, int node);
207extern void __init pre_init_apic_IRQ0(void);
208 178
209extern void mp_save_irq(struct mpc_intsrc *m); 179extern void mp_save_irq(struct mpc_intsrc *m);
210 180
211extern void disable_ioapic_support(void); 181extern void disable_ioapic_support(void);
212 182
213extern void __init native_io_apic_init_mappings(void); 183extern void __init io_apic_init_mappings(void);
214extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg); 184extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg);
215extern void native_io_apic_write(unsigned int apic, unsigned int reg, unsigned int val);
216extern void native_io_apic_modify(unsigned int apic, unsigned int reg, unsigned int val);
217extern void native_disable_io_apic(void); 185extern void native_disable_io_apic(void);
218extern void native_io_apic_print_entries(unsigned int apic, unsigned int nr_entries);
219extern void intel_ir_io_apic_print_entries(unsigned int apic, unsigned int nr_entries);
220extern int native_ioapic_set_affinity(struct irq_data *,
221 const struct cpumask *,
222 bool);
223 186
224static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg) 187static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
225{ 188{
226 return x86_io_apic_ops.read(apic, reg); 189 return x86_io_apic_ops.read(apic, reg);
227} 190}
228 191
229static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
230{
231 x86_io_apic_ops.write(apic, reg, value);
232}
233static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
234{
235 x86_io_apic_ops.modify(apic, reg, value);
236}
237
238extern void io_apic_eoi(unsigned int apic, unsigned int vector);
239
240extern void setup_IO_APIC(void); 192extern void setup_IO_APIC(void);
241extern void enable_IO_APIC(void); 193extern void enable_IO_APIC(void);
242extern void disable_IO_APIC(void); 194extern void disable_IO_APIC(void);
@@ -253,8 +205,12 @@ static inline int arch_early_ioapic_init(void) { return 0; }
253static inline void print_IO_APICs(void) {} 205static inline void print_IO_APICs(void) {}
254#define gsi_top (NR_IRQS_LEGACY) 206#define gsi_top (NR_IRQS_LEGACY)
255static inline int mp_find_ioapic(u32 gsi) { return 0; } 207static inline int mp_find_ioapic(u32 gsi) { return 0; }
256static inline u32 mp_pin_to_gsi(int ioapic, int pin) { return UINT_MAX; } 208static inline int mp_map_gsi_to_irq(u32 gsi, unsigned int flags,
257static inline int mp_map_gsi_to_irq(u32 gsi, unsigned int flags) { return gsi; } 209 struct irq_alloc_info *info)
210{
211 return gsi;
212}
213
258static inline void mp_unmap_irq(int irq) { } 214static inline void mp_unmap_irq(int irq) { }
259 215
260static inline int save_ioapic_entries(void) 216static inline int save_ioapic_entries(void)
@@ -268,17 +224,11 @@ static inline int restore_ioapic_entries(void)
268 return -ENOMEM; 224 return -ENOMEM;
269} 225}
270 226
271static inline void mp_save_irq(struct mpc_intsrc *m) { }; 227static inline void mp_save_irq(struct mpc_intsrc *m) { }
272static inline void disable_ioapic_support(void) { } 228static inline void disable_ioapic_support(void) { }
273#define native_io_apic_init_mappings NULL 229static inline void io_apic_init_mappings(void) { }
274#define native_io_apic_read NULL 230#define native_io_apic_read NULL
275#define native_io_apic_write NULL
276#define native_io_apic_modify NULL
277#define native_disable_io_apic NULL 231#define native_disable_io_apic NULL
278#define native_io_apic_print_entries NULL
279#define native_ioapic_set_affinity NULL
280#define native_setup_ioapic_entry NULL
281#define native_eoi_ioapic_pin NULL
282 232
283static inline void setup_IO_APIC(void) { } 233static inline void setup_IO_APIC(void) { }
284static inline void enable_IO_APIC(void) { } 234static inline void enable_IO_APIC(void) { }
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index a80cbb88ea91..8008d06581c7 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -30,6 +30,10 @@ extern void fixup_irqs(void);
30extern void irq_force_complete_move(int); 30extern void irq_force_complete_move(int);
31#endif 31#endif
32 32
33#ifdef CONFIG_HAVE_KVM
34extern void kvm_set_posted_intr_wakeup_handler(void (*handler)(void));
35#endif
36
33extern void (*x86_platform_ipi_callback)(void); 37extern void (*x86_platform_ipi_callback)(void);
34extern void native_init_IRQ(void); 38extern void native_init_IRQ(void);
35extern bool handle_irq(unsigned irq, struct pt_regs *regs); 39extern bool handle_irq(unsigned irq, struct pt_regs *regs);
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 6224d316c405..046c7fb1ca43 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -22,84 +22,72 @@
22#ifndef __X86_IRQ_REMAPPING_H 22#ifndef __X86_IRQ_REMAPPING_H
23#define __X86_IRQ_REMAPPING_H 23#define __X86_IRQ_REMAPPING_H
24 24
25#include <asm/irqdomain.h>
26#include <asm/hw_irq.h>
25#include <asm/io_apic.h> 27#include <asm/io_apic.h>
26 28
27struct IO_APIC_route_entry;
28struct io_apic_irq_attr;
29struct irq_chip;
30struct msi_msg; 29struct msi_msg;
31struct pci_dev; 30struct irq_alloc_info;
32struct irq_cfg; 31
32enum irq_remap_cap {
33 IRQ_POSTING_CAP = 0,
34};
33 35
34#ifdef CONFIG_IRQ_REMAP 36#ifdef CONFIG_IRQ_REMAP
35 37
38extern bool irq_remapping_cap(enum irq_remap_cap cap);
36extern void set_irq_remapping_broken(void); 39extern void set_irq_remapping_broken(void);
37extern int irq_remapping_prepare(void); 40extern int irq_remapping_prepare(void);
38extern int irq_remapping_enable(void); 41extern int irq_remapping_enable(void);
39extern void irq_remapping_disable(void); 42extern void irq_remapping_disable(void);
40extern int irq_remapping_reenable(int); 43extern int irq_remapping_reenable(int);
41extern int irq_remap_enable_fault_handling(void); 44extern int irq_remap_enable_fault_handling(void);
42extern int setup_ioapic_remapped_entry(int irq,
43 struct IO_APIC_route_entry *entry,
44 unsigned int destination,
45 int vector,
46 struct io_apic_irq_attr *attr);
47extern void free_remapped_irq(int irq);
48extern void compose_remapped_msi_msg(struct pci_dev *pdev,
49 unsigned int irq, unsigned int dest,
50 struct msi_msg *msg, u8 hpet_id);
51extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
52extern void panic_if_irq_remap(const char *msg); 45extern void panic_if_irq_remap(const char *msg);
53extern bool setup_remapped_irq(int irq,
54 struct irq_cfg *cfg,
55 struct irq_chip *chip);
56 46
57void irq_remap_modify_chip_defaults(struct irq_chip *chip); 47extern struct irq_domain *
48irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info);
49extern struct irq_domain *
50irq_remapping_get_irq_domain(struct irq_alloc_info *info);
51
52/* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
53extern struct irq_domain *arch_create_msi_irq_domain(struct irq_domain *parent);
54
55/* Get parent irqdomain for interrupt remapping irqdomain */
56static inline struct irq_domain *arch_get_ir_parent_domain(void)
57{
58 return x86_vector_domain;
59}
60
61struct vcpu_data {
62 u64 pi_desc_addr; /* Physical address of PI Descriptor */
63 u32 vector; /* Guest vector of the interrupt */
64};
58 65
59#else /* CONFIG_IRQ_REMAP */ 66#else /* CONFIG_IRQ_REMAP */
60 67
68static inline bool irq_remapping_cap(enum irq_remap_cap cap) { return 0; }
61static inline void set_irq_remapping_broken(void) { } 69static inline void set_irq_remapping_broken(void) { }
62static inline int irq_remapping_prepare(void) { return -ENODEV; } 70static inline int irq_remapping_prepare(void) { return -ENODEV; }
63static inline int irq_remapping_enable(void) { return -ENODEV; } 71static inline int irq_remapping_enable(void) { return -ENODEV; }
64static inline void irq_remapping_disable(void) { } 72static inline void irq_remapping_disable(void) { }
65static inline int irq_remapping_reenable(int eim) { return -ENODEV; } 73static inline int irq_remapping_reenable(int eim) { return -ENODEV; }
66static inline int irq_remap_enable_fault_handling(void) { return -ENODEV; } 74static inline int irq_remap_enable_fault_handling(void) { return -ENODEV; }
67static inline int setup_ioapic_remapped_entry(int irq,
68 struct IO_APIC_route_entry *entry,
69 unsigned int destination,
70 int vector,
71 struct io_apic_irq_attr *attr)
72{
73 return -ENODEV;
74}
75static inline void free_remapped_irq(int irq) { }
76static inline void compose_remapped_msi_msg(struct pci_dev *pdev,
77 unsigned int irq, unsigned int dest,
78 struct msi_msg *msg, u8 hpet_id)
79{
80}
81static inline int setup_hpet_msi_remapped(unsigned int irq, unsigned int id)
82{
83 return -ENODEV;
84}
85 75
86static inline void panic_if_irq_remap(const char *msg) 76static inline void panic_if_irq_remap(const char *msg)
87{ 77{
88} 78}
89 79
90static inline void irq_remap_modify_chip_defaults(struct irq_chip *chip) 80static inline struct irq_domain *
81irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
91{ 82{
83 return NULL;
92} 84}
93 85
94static inline bool setup_remapped_irq(int irq, 86static inline struct irq_domain *
95 struct irq_cfg *cfg, 87irq_remapping_get_irq_domain(struct irq_alloc_info *info)
96 struct irq_chip *chip)
97{ 88{
98 return false; 89 return NULL;
99} 90}
100#endif /* CONFIG_IRQ_REMAP */
101
102#define dmar_alloc_hwirq() irq_alloc_hwirq(-1)
103#define dmar_free_hwirq irq_free_hwirq
104 91
92#endif /* CONFIG_IRQ_REMAP */
105#endif /* __X86_IRQ_REMAPPING_H */ 93#endif /* __X86_IRQ_REMAPPING_H */
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 666c89ec4bd7..4c2d2eb2060a 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -47,31 +47,12 @@
47#define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR 47#define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR
48 48
49#define IA32_SYSCALL_VECTOR 0x80 49#define IA32_SYSCALL_VECTOR 0x80
50#ifdef CONFIG_X86_32
51# define SYSCALL_VECTOR 0x80
52#endif
53 50
54/* 51/*
55 * Vectors 0x30-0x3f are used for ISA interrupts. 52 * Vectors 0x30-0x3f are used for ISA interrupts.
56 * round up to the next 16-vector boundary 53 * round up to the next 16-vector boundary
57 */ 54 */
58#define IRQ0_VECTOR ((FIRST_EXTERNAL_VECTOR + 16) & ~15) 55#define ISA_IRQ_VECTOR(irq) (((FIRST_EXTERNAL_VECTOR + 16) & ~15) + irq)
59
60#define IRQ1_VECTOR (IRQ0_VECTOR + 1)
61#define IRQ2_VECTOR (IRQ0_VECTOR + 2)
62#define IRQ3_VECTOR (IRQ0_VECTOR + 3)
63#define IRQ4_VECTOR (IRQ0_VECTOR + 4)
64#define IRQ5_VECTOR (IRQ0_VECTOR + 5)
65#define IRQ6_VECTOR (IRQ0_VECTOR + 6)
66#define IRQ7_VECTOR (IRQ0_VECTOR + 7)
67#define IRQ8_VECTOR (IRQ0_VECTOR + 8)
68#define IRQ9_VECTOR (IRQ0_VECTOR + 9)
69#define IRQ10_VECTOR (IRQ0_VECTOR + 10)
70#define IRQ11_VECTOR (IRQ0_VECTOR + 11)
71#define IRQ12_VECTOR (IRQ0_VECTOR + 12)
72#define IRQ13_VECTOR (IRQ0_VECTOR + 13)
73#define IRQ14_VECTOR (IRQ0_VECTOR + 14)
74#define IRQ15_VECTOR (IRQ0_VECTOR + 15)
75 56
76/* 57/*
77 * Special IRQ vectors used by the SMP architecture, 0xf0-0xff 58 * Special IRQ vectors used by the SMP architecture, 0xf0-0xff
@@ -102,21 +83,23 @@
102 */ 83 */
103#define X86_PLATFORM_IPI_VECTOR 0xf7 84#define X86_PLATFORM_IPI_VECTOR 0xf7
104 85
105/* Vector for KVM to deliver posted interrupt IPI */ 86#define POSTED_INTR_WAKEUP_VECTOR 0xf1
106#ifdef CONFIG_HAVE_KVM
107#define POSTED_INTR_VECTOR 0xf2
108#endif
109
110/* 87/*
111 * IRQ work vector: 88 * IRQ work vector:
112 */ 89 */
113#define IRQ_WORK_VECTOR 0xf6 90#define IRQ_WORK_VECTOR 0xf6
114 91
115#define UV_BAU_MESSAGE 0xf5 92#define UV_BAU_MESSAGE 0xf5
93#define DEFERRED_ERROR_VECTOR 0xf4
116 94
117/* Vector on which hypervisor callbacks will be delivered */ 95/* Vector on which hypervisor callbacks will be delivered */
118#define HYPERVISOR_CALLBACK_VECTOR 0xf3 96#define HYPERVISOR_CALLBACK_VECTOR 0xf3
119 97
98/* Vector for KVM to deliver posted interrupt IPI */
99#ifdef CONFIG_HAVE_KVM
100#define POSTED_INTR_VECTOR 0xf2
101#endif
102
120/* 103/*
121 * Local APIC timer IRQ vector is on a different priority level, 104 * Local APIC timer IRQ vector is on a different priority level,
122 * to work around the 'lost local interrupt if more than 2 IRQ 105 * to work around the 'lost local interrupt if more than 2 IRQ
@@ -155,18 +138,22 @@ static inline int invalid_vm86_irq(int irq)
155 * static arrays. 138 * static arrays.
156 */ 139 */
157 140
158#define NR_IRQS_LEGACY 16 141#define NR_IRQS_LEGACY 16
159 142
160#define IO_APIC_VECTOR_LIMIT ( 32 * MAX_IO_APICS ) 143#define CPU_VECTOR_LIMIT (64 * NR_CPUS)
144#define IO_APIC_VECTOR_LIMIT (32 * MAX_IO_APICS)
161 145
162#ifdef CONFIG_X86_IO_APIC 146#if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_PCI_MSI)
163# define CPU_VECTOR_LIMIT (64 * NR_CPUS) 147#define NR_IRQS \
164# define NR_IRQS \
165 (CPU_VECTOR_LIMIT > IO_APIC_VECTOR_LIMIT ? \ 148 (CPU_VECTOR_LIMIT > IO_APIC_VECTOR_LIMIT ? \
166 (NR_VECTORS + CPU_VECTOR_LIMIT) : \ 149 (NR_VECTORS + CPU_VECTOR_LIMIT) : \
167 (NR_VECTORS + IO_APIC_VECTOR_LIMIT)) 150 (NR_VECTORS + IO_APIC_VECTOR_LIMIT))
168#else /* !CONFIG_X86_IO_APIC: */ 151#elif defined(CONFIG_X86_IO_APIC)
169# define NR_IRQS NR_IRQS_LEGACY 152#define NR_IRQS (NR_VECTORS + IO_APIC_VECTOR_LIMIT)
153#elif defined(CONFIG_PCI_MSI)
154#define NR_IRQS (NR_VECTORS + CPU_VECTOR_LIMIT)
155#else
156#define NR_IRQS NR_IRQS_LEGACY
170#endif 157#endif
171 158
172#endif /* _ASM_X86_IRQ_VECTORS_H */ 159#endif /* _ASM_X86_IRQ_VECTORS_H */
diff --git a/arch/x86/include/asm/irqdomain.h b/arch/x86/include/asm/irqdomain.h
new file mode 100644
index 000000000000..d26075b52885
--- /dev/null
+++ b/arch/x86/include/asm/irqdomain.h
@@ -0,0 +1,63 @@
1#ifndef _ASM_IRQDOMAIN_H
2#define _ASM_IRQDOMAIN_H
3
4#include <linux/irqdomain.h>
5#include <asm/hw_irq.h>
6
7#ifdef CONFIG_X86_LOCAL_APIC
8enum {
9 /* Allocate contiguous CPU vectors */
10 X86_IRQ_ALLOC_CONTIGUOUS_VECTORS = 0x1,
11};
12
13extern struct irq_domain *x86_vector_domain;
14
15extern void init_irq_alloc_info(struct irq_alloc_info *info,
16 const struct cpumask *mask);
17extern void copy_irq_alloc_info(struct irq_alloc_info *dst,
18 struct irq_alloc_info *src);
19#endif /* CONFIG_X86_LOCAL_APIC */
20
21#ifdef CONFIG_X86_IO_APIC
22struct device_node;
23struct irq_data;
24
25enum ioapic_domain_type {
26 IOAPIC_DOMAIN_INVALID,
27 IOAPIC_DOMAIN_LEGACY,
28 IOAPIC_DOMAIN_STRICT,
29 IOAPIC_DOMAIN_DYNAMIC,
30};
31
32struct ioapic_domain_cfg {
33 enum ioapic_domain_type type;
34 const struct irq_domain_ops *ops;
35 struct device_node *dev;
36};
37
38extern const struct irq_domain_ops mp_ioapic_irqdomain_ops;
39
40extern int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
41 unsigned int nr_irqs, void *arg);
42extern void mp_irqdomain_free(struct irq_domain *domain, unsigned int virq,
43 unsigned int nr_irqs);
44extern void mp_irqdomain_activate(struct irq_domain *domain,
45 struct irq_data *irq_data);
46extern void mp_irqdomain_deactivate(struct irq_domain *domain,
47 struct irq_data *irq_data);
48extern int mp_irqdomain_ioapic_idx(struct irq_domain *domain);
49#endif /* CONFIG_X86_IO_APIC */
50
51#ifdef CONFIG_PCI_MSI
52extern void arch_init_msi_domain(struct irq_domain *domain);
53#else
54static inline void arch_init_msi_domain(struct irq_domain *domain) { }
55#endif
56
57#ifdef CONFIG_HT_IRQ
58extern void arch_init_htirq_domain(struct irq_domain *domain);
59#else
60static inline void arch_init_htirq_domain(struct irq_domain *domain) { }
61#endif
62
63#endif
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1f5a86d518db..982dfc3679ad 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -17,11 +17,16 @@
17#define MCG_EXT_CNT(c) (((c) & MCG_EXT_CNT_MASK) >> MCG_EXT_CNT_SHIFT) 17#define MCG_EXT_CNT(c) (((c) & MCG_EXT_CNT_MASK) >> MCG_EXT_CNT_SHIFT)
18#define MCG_SER_P (1ULL<<24) /* MCA recovery/new status bits */ 18#define MCG_SER_P (1ULL<<24) /* MCA recovery/new status bits */
19#define MCG_ELOG_P (1ULL<<26) /* Extended error log supported */ 19#define MCG_ELOG_P (1ULL<<26) /* Extended error log supported */
20#define MCG_LMCE_P (1ULL<<27) /* Local machine check supported */
20 21
21/* MCG_STATUS register defines */ 22/* MCG_STATUS register defines */
22#define MCG_STATUS_RIPV (1ULL<<0) /* restart ip valid */ 23#define MCG_STATUS_RIPV (1ULL<<0) /* restart ip valid */
23#define MCG_STATUS_EIPV (1ULL<<1) /* ip points to correct instruction */ 24#define MCG_STATUS_EIPV (1ULL<<1) /* ip points to correct instruction */
24#define MCG_STATUS_MCIP (1ULL<<2) /* machine check in progress */ 25#define MCG_STATUS_MCIP (1ULL<<2) /* machine check in progress */
26#define MCG_STATUS_LMCES (1ULL<<3) /* LMCE signaled */
27
28/* MCG_EXT_CTL register defines */
29#define MCG_EXT_CTL_LMCE_EN (1ULL<<0) /* Enable LMCE */
25 30
26/* MCi_STATUS register defines */ 31/* MCi_STATUS register defines */
27#define MCI_STATUS_VAL (1ULL<<63) /* valid error */ 32#define MCI_STATUS_VAL (1ULL<<63) /* valid error */
@@ -104,6 +109,7 @@ struct mce_log {
104struct mca_config { 109struct mca_config {
105 bool dont_log_ce; 110 bool dont_log_ce;
106 bool cmci_disabled; 111 bool cmci_disabled;
112 bool lmce_disabled;
107 bool ignore_ce; 113 bool ignore_ce;
108 bool disabled; 114 bool disabled;
109 bool ser; 115 bool ser;
@@ -117,8 +123,19 @@ struct mca_config {
117}; 123};
118 124
119struct mce_vendor_flags { 125struct mce_vendor_flags {
120 __u64 overflow_recov : 1, /* cpuid_ebx(80000007) */ 126 /*
121 __reserved_0 : 63; 127 * overflow recovery cpuid bit indicates that overflow
128 * conditions are not fatal
129 */
130 __u64 overflow_recov : 1,
131
132 /*
133 * SUCCOR stands for S/W UnCorrectable error COntainment
134 * and Recovery. It indicates support for data poisoning
135 * in HW and deferred error interrupts.
136 */
137 succor : 1,
138 __reserved_0 : 62;
122}; 139};
123extern struct mce_vendor_flags mce_flags; 140extern struct mce_vendor_flags mce_flags;
124 141
@@ -168,12 +185,16 @@ void cmci_clear(void);
168void cmci_reenable(void); 185void cmci_reenable(void);
169void cmci_rediscover(void); 186void cmci_rediscover(void);
170void cmci_recheck(void); 187void cmci_recheck(void);
188void lmce_clear(void);
189void lmce_enable(void);
171#else 190#else
172static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { } 191static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
173static inline void cmci_clear(void) {} 192static inline void cmci_clear(void) {}
174static inline void cmci_reenable(void) {} 193static inline void cmci_reenable(void) {}
175static inline void cmci_rediscover(void) {} 194static inline void cmci_rediscover(void) {}
176static inline void cmci_recheck(void) {} 195static inline void cmci_recheck(void) {}
196static inline void lmce_clear(void) {}
197static inline void lmce_enable(void) {}
177#endif 198#endif
178 199
179#ifdef CONFIG_X86_MCE_AMD 200#ifdef CONFIG_X86_MCE_AMD
@@ -223,6 +244,9 @@ void do_machine_check(struct pt_regs *, long);
223extern void (*mce_threshold_vector)(void); 244extern void (*mce_threshold_vector)(void);
224extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu); 245extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu);
225 246
247/* Deferred error interrupt handler */
248extern void (*deferred_error_int_vector)(void);
249
226/* 250/*
227 * Thermal handler 251 * Thermal handler
228 */ 252 */
diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
new file mode 100644
index 000000000000..93724cc62177
--- /dev/null
+++ b/arch/x86/include/asm/msi.h
@@ -0,0 +1,7 @@
1#ifndef _ASM_X86_MSI_H
2#define _ASM_X86_MSI_H
3#include <asm/hw_irq.h>
4
5typedef struct irq_alloc_info msi_alloc_info_t;
6
7#endif /* _ASM_X86_MSI_H */
diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3c6bb342a48f..9ebc3d009373 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -56,6 +56,7 @@
56#define MSR_IA32_MCG_CAP 0x00000179 56#define MSR_IA32_MCG_CAP 0x00000179
57#define MSR_IA32_MCG_STATUS 0x0000017a 57#define MSR_IA32_MCG_STATUS 0x0000017a
58#define MSR_IA32_MCG_CTL 0x0000017b 58#define MSR_IA32_MCG_CTL 0x0000017b
59#define MSR_IA32_MCG_EXT_CTL 0x000004d0
59 60
60#define MSR_OFFCORE_RSP_0 0x000001a6 61#define MSR_OFFCORE_RSP_0 0x000001a6
61#define MSR_OFFCORE_RSP_1 0x000001a7 62#define MSR_OFFCORE_RSP_1 0x000001a7
@@ -380,6 +381,7 @@
380#define FEATURE_CONTROL_LOCKED (1<<0) 381#define FEATURE_CONTROL_LOCKED (1<<0)
381#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) 382#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1)
382#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2) 383#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2)
384#define FEATURE_CONTROL_LMCE (1<<20)
383 385
384#define MSR_IA32_APICBASE 0x0000001b 386#define MSR_IA32_APICBASE 0x0000001b
385#define MSR_IA32_APICBASE_BSP (1<<8) 387#define MSR_IA32_APICBASE_BSP (1<<8)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index de36f22eb0b9..e6a707eb5081 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -1,13 +1,14 @@
1#ifndef _ASM_X86_MSR_H 1#ifndef _ASM_X86_MSR_H
2#define _ASM_X86_MSR_H 2#define _ASM_X86_MSR_H
3 3
4#include <uapi/asm/msr.h> 4#include "msr-index.h"
5 5
6#ifndef __ASSEMBLY__ 6#ifndef __ASSEMBLY__
7 7
8#include <asm/asm.h> 8#include <asm/asm.h>
9#include <asm/errno.h> 9#include <asm/errno.h>
10#include <asm/cpumask.h> 10#include <asm/cpumask.h>
11#include <uapi/asm/msr.h>
11 12
12struct msr { 13struct msr {
13 union { 14 union {
@@ -205,8 +206,13 @@ do { \
205 206
206#endif /* !CONFIG_PARAVIRT */ 207#endif /* !CONFIG_PARAVIRT */
207 208
208#define wrmsrl_safe(msr, val) wrmsr_safe((msr), (u32)(val), \ 209/*
209 (u32)((val) >> 32)) 210 * 64-bit version of wrmsr_safe():
211 */
212static inline int wrmsrl_safe(u32 msr, u64 val)
213{
214 return wrmsr_safe(msr, (u32)val, (u32)(val >> 32));
215}
210 216
211#define write_tsc(low, high) wrmsr(MSR_IA32_TSC, (low), (high)) 217#define write_tsc(low, high) wrmsr(MSR_IA32_TSC, (low), (high))
212 218
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f6298419..b94f6f64e23d 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
31 * arch_phys_wc_add and arch_phys_wc_del. 31 * arch_phys_wc_add and arch_phys_wc_del.
32 */ 32 */
33# ifdef CONFIG_MTRR 33# ifdef CONFIG_MTRR
34extern u8 mtrr_type_lookup(u64 addr, u64 end); 34extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
35extern void mtrr_save_fixed_ranges(void *); 35extern void mtrr_save_fixed_ranges(void *);
36extern void mtrr_save_state(void); 36extern void mtrr_save_state(void);
37extern int mtrr_add(unsigned long base, unsigned long size, 37extern int mtrr_add(unsigned long base, unsigned long size,
@@ -48,14 +48,13 @@ extern void mtrr_aps_init(void);
48extern void mtrr_bp_restore(void); 48extern void mtrr_bp_restore(void);
49extern int mtrr_trim_uncached_memory(unsigned long end_pfn); 49extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
50extern int amd_special_default_mtrr(void); 50extern int amd_special_default_mtrr(void);
51extern int phys_wc_to_mtrr_index(int handle);
52# else 51# else
53static inline u8 mtrr_type_lookup(u64 addr, u64 end) 52static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
54{ 53{
55 /* 54 /*
56 * Return no-MTRRs: 55 * Return no-MTRRs:
57 */ 56 */
58 return 0xff; 57 return MTRR_TYPE_INVALID;
59} 58}
60#define mtrr_save_fixed_ranges(arg) do {} while (0) 59#define mtrr_save_fixed_ranges(arg) do {} while (0)
61#define mtrr_save_state() do {} while (0) 60#define mtrr_save_state() do {} while (0)
@@ -84,10 +83,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
84static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi) 83static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
85{ 84{
86} 85}
87static inline int phys_wc_to_mtrr_index(int handle)
88{
89 return -1;
90}
91 86
92#define mtrr_ap_init() do {} while (0) 87#define mtrr_ap_init() do {} while (0)
93#define mtrr_bp_init() do {} while (0) 88#define mtrr_bp_init() do {} while (0)
@@ -127,4 +122,8 @@ struct mtrr_gentry32 {
127 _IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry32) 122 _IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry32)
128#endif /* CONFIG_COMPAT */ 123#endif /* CONFIG_COMPAT */
129 124
125/* Bit fields for enabled in struct mtrr_state_type */
126#define MTRR_STATE_MTRR_FIXED_ENABLED 0x01
127#define MTRR_STATE_MTRR_ENABLED 0x02
128
130#endif /* _ASM_X86_MTRR_H */ 129#endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 8766c7c395c2..a6b8f9fadb06 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -160,13 +160,14 @@ struct pv_cpu_ops {
160 u64 (*read_pmc)(int counter); 160 u64 (*read_pmc)(int counter);
161 unsigned long long (*read_tscp)(unsigned int *aux); 161 unsigned long long (*read_tscp)(unsigned int *aux);
162 162
163#ifdef CONFIG_X86_32
163 /* 164 /*
164 * Atomically enable interrupts and return to userspace. This 165 * Atomically enable interrupts and return to userspace. This
165 * is only ever used to return to 32-bit processes; in a 166 * is only used in 32-bit kernels. 64-bit kernels use
166 * 64-bit kernel, it's used for 32-on-64 compat processes, but 167 * usergs_sysret32 instead.
167 * never native 64-bit processes. (Jump, not call.)
168 */ 168 */
169 void (*irq_enable_sysexit)(void); 169 void (*irq_enable_sysexit)(void);
170#endif
170 171
171 /* 172 /*
172 * Switch to usermode gs and return to 64-bit usermode using 173 * Switch to usermode gs and return to 64-bit usermode using
diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 91bc4ba95f91..ca6c228d5e62 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -4,14 +4,9 @@
4#include <linux/types.h> 4#include <linux/types.h>
5#include <asm/pgtable_types.h> 5#include <asm/pgtable_types.h>
6 6
7#ifdef CONFIG_X86_PAT 7bool pat_enabled(void);
8extern int pat_enabled;
9#else
10static const int pat_enabled;
11#endif
12
13extern void pat_init(void); 8extern void pat_init(void);
14void pat_init_cache_modes(void); 9void pat_init_cache_modes(u64);
15 10
16extern int reserve_memtype(u64 start, u64 end, 11extern int reserve_memtype(u64 start, u64 end,
17 enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm); 12 enum page_cache_mode req_pcm, enum page_cache_mode *ret_pcm);
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 4e370a5d8117..d8c80ff32e8c 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -96,15 +96,10 @@ extern void pci_iommu_alloc(void);
96#ifdef CONFIG_PCI_MSI 96#ifdef CONFIG_PCI_MSI
97/* implemented in arch/x86/kernel/apic/io_apic. */ 97/* implemented in arch/x86/kernel/apic/io_apic. */
98struct msi_desc; 98struct msi_desc;
99void native_compose_msi_msg(struct pci_dev *pdev, unsigned int irq,
100 unsigned int dest, struct msi_msg *msg, u8 hpet_id);
101int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type); 99int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
102void native_teardown_msi_irq(unsigned int irq); 100void native_teardown_msi_irq(unsigned int irq);
103void native_restore_msi_irqs(struct pci_dev *dev); 101void native_restore_msi_irqs(struct pci_dev *dev);
104int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
105 unsigned int irq_base, unsigned int irq_offset);
106#else 102#else
107#define native_compose_msi_msg NULL
108#define native_setup_msi_irqs NULL 103#define native_setup_msi_irqs NULL
109#define native_teardown_msi_irq NULL 104#define native_teardown_msi_irq NULL
110#endif 105#endif
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index fe57e7a98839..2562e303405b 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -398,11 +398,17 @@ static inline int is_new_memtype_allowed(u64 paddr, unsigned long size,
398 * requested memtype: 398 * requested memtype:
399 * - request is uncached, return cannot be write-back 399 * - request is uncached, return cannot be write-back
400 * - request is write-combine, return cannot be write-back 400 * - request is write-combine, return cannot be write-back
401 * - request is write-through, return cannot be write-back
402 * - request is write-through, return cannot be write-combine
401 */ 403 */
402 if ((pcm == _PAGE_CACHE_MODE_UC_MINUS && 404 if ((pcm == _PAGE_CACHE_MODE_UC_MINUS &&
403 new_pcm == _PAGE_CACHE_MODE_WB) || 405 new_pcm == _PAGE_CACHE_MODE_WB) ||
404 (pcm == _PAGE_CACHE_MODE_WC && 406 (pcm == _PAGE_CACHE_MODE_WC &&
405 new_pcm == _PAGE_CACHE_MODE_WB)) { 407 new_pcm == _PAGE_CACHE_MODE_WB) ||
408 (pcm == _PAGE_CACHE_MODE_WT &&
409 new_pcm == _PAGE_CACHE_MODE_WB) ||
410 (pcm == _PAGE_CACHE_MODE_WT &&
411 new_pcm == _PAGE_CACHE_MODE_WC)) {
406 return 0; 412 return 0;
407 } 413 }
408 414
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 78f0c8cbe316..13f310bfc09a 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -367,6 +367,9 @@ extern int nx_enabled;
367#define pgprot_writecombine pgprot_writecombine 367#define pgprot_writecombine pgprot_writecombine
368extern pgprot_t pgprot_writecombine(pgprot_t prot); 368extern pgprot_t pgprot_writecombine(pgprot_t prot);
369 369
370#define pgprot_writethrough pgprot_writethrough
371extern pgprot_t pgprot_writethrough(pgprot_t prot);
372
370/* Indicate that x86 has its own track and untrack pfn vma functions */ 373/* Indicate that x86 has its own track and untrack pfn vma functions */
371#define __HAVE_PFNMAP_TRACKING 374#define __HAVE_PFNMAP_TRACKING
372 375
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index a90f8972dad5..a4a77286cb1d 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -5,12 +5,14 @@
5 5
6/* misc architecture specific prototypes */ 6/* misc architecture specific prototypes */
7 7
8void system_call(void);
9void syscall_init(void); 8void syscall_init(void);
10 9
11void ia32_syscall(void); 10void entry_SYSCALL_64(void);
12void ia32_cstar_target(void); 11void entry_SYSCALL_compat(void);
13void ia32_sysenter_target(void); 12void entry_INT80_32(void);
13void entry_INT80_compat(void);
14void entry_SYSENTER_32(void);
15void entry_SYSENTER_compat(void);
14 16
15void x86_configure_nx(void); 17void x86_configure_nx(void);
16void x86_report_nx(void); 18void x86_report_nx(void);
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index aeb4666e0c0a..2270e41b32fd 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -215,6 +215,44 @@ static inline void clwb(volatile void *__p)
215 : [pax] "a" (p)); 215 : [pax] "a" (p));
216} 216}
217 217
218/**
219 * pcommit_sfence() - persistent commit and fence
220 *
221 * The PCOMMIT instruction ensures that data that has been flushed from the
222 * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
223 * memory and is durable on the DIMM. The primary use case for this is
224 * persistent memory.
225 *
226 * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
227 * with appropriate fencing.
228 *
229 * Example:
230 * void flush_and_commit_buffer(void *vaddr, unsigned int size)
231 * {
232 * unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
233 * void *vend = vaddr + size;
234 * void *p;
235 *
236 * for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
237 * p < vend; p += boot_cpu_data.x86_clflush_size)
238 * clwb(p);
239 *
240 * // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
241 * // MFENCE via mb() also works
242 * wmb();
243 *
244 * // PCOMMIT and the required SFENCE for ordering
245 * pcommit_sfence();
246 * }
247 *
248 * After this function completes the data pointed to by 'vaddr' has been
249 * accepted to memory and will be durable if the 'vaddr' points to persistent
250 * memory.
251 *
252 * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
253 * things we include both the PCOMMIT and the required SFENCE in the
254 * alternatives generated by pcommit_sfence().
255 */
218static inline void pcommit_sfence(void) 256static inline void pcommit_sfence(void)
219{ 257{
220 alternative(ASM_NOP7, 258 alternative(ASM_NOP7,
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index b4bdec3e9523..225ee545e1a0 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -177,8 +177,6 @@ struct thread_info {
177 */ 177 */
178#ifndef __ASSEMBLY__ 178#ifndef __ASSEMBLY__
179 179
180DECLARE_PER_CPU(unsigned long, kernel_stack);
181
182static inline struct thread_info *current_thread_info(void) 180static inline struct thread_info *current_thread_info(void)
183{ 181{
184 return (struct thread_info *)(current_top_of_stack() - THREAD_SIZE); 182 return (struct thread_info *)(current_top_of_stack() - THREAD_SIZE);
@@ -197,9 +195,13 @@ static inline unsigned long current_stack_pointer(void)
197 195
198#else /* !__ASSEMBLY__ */ 196#else /* !__ASSEMBLY__ */
199 197
198#ifdef CONFIG_X86_64
199# define cpu_current_top_of_stack (cpu_tss + TSS_sp0)
200#endif
201
200/* Load thread_info address into "reg" */ 202/* Load thread_info address into "reg" */
201#define GET_THREAD_INFO(reg) \ 203#define GET_THREAD_INFO(reg) \
202 _ASM_MOV PER_CPU_VAR(kernel_stack),reg ; \ 204 _ASM_MOV PER_CPU_VAR(cpu_current_top_of_stack),reg ; \
203 _ASM_SUB $(THREAD_SIZE),reg ; 205 _ASM_SUB $(THREAD_SIZE),reg ;
204 206
205/* 207/*
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 5a77593fdace..0fb46482dfde 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -26,7 +26,7 @@
26#define _ASM_X86_TOPOLOGY_H 26#define _ASM_X86_TOPOLOGY_H
27 27
28#ifdef CONFIG_X86_32 28#ifdef CONFIG_X86_32
29# ifdef CONFIG_X86_HT 29# ifdef CONFIG_SMP
30# define ENABLE_TOPO_DEFINES 30# define ENABLE_TOPO_DEFINES
31# endif 31# endif
32#else 32#else
diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h
index 4cab890007a7..38a09a13a9bc 100644
--- a/arch/x86/include/asm/trace/irq_vectors.h
+++ b/arch/x86/include/asm/trace/irq_vectors.h
@@ -101,6 +101,12 @@ DEFINE_IRQ_VECTOR_EVENT(call_function_single);
101DEFINE_IRQ_VECTOR_EVENT(threshold_apic); 101DEFINE_IRQ_VECTOR_EVENT(threshold_apic);
102 102
103/* 103/*
104 * deferred_error_apic - called when entering/exiting a deferred apic interrupt
105 * vector handler
106 */
107DEFINE_IRQ_VECTOR_EVENT(deferred_error_apic);
108
109/*
104 * thermal_apic - called when entering/exiting a thermal apic interrupt 110 * thermal_apic - called when entering/exiting a thermal apic interrupt
105 * vector handler 111 * vector handler
106 */ 112 */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 4e49d7dff78e..c5380bea2a36 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -108,7 +108,8 @@ extern int panic_on_unrecovered_nmi;
108void math_emulate(struct math_emu_info *); 108void math_emulate(struct math_emu_info *);
109#ifndef CONFIG_X86_32 109#ifndef CONFIG_X86_32
110asmlinkage void smp_thermal_interrupt(void); 110asmlinkage void smp_thermal_interrupt(void);
111asmlinkage void mce_threshold_interrupt(void); 111asmlinkage void smp_threshold_interrupt(void);
112asmlinkage void smp_deferred_error_interrupt(void);
112#endif 113#endif
113 114
114extern enum ctx_state ist_enter(struct pt_regs *regs); 115extern enum ctx_state ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 7c8ad3451988..f5dcb5204dcd 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -59,6 +59,10 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)
59 __put_user_size(*(u32 *)from, (u32 __user *)to, 59 __put_user_size(*(u32 *)from, (u32 __user *)to,
60 4, ret, 4); 60 4, ret, 4);
61 return ret; 61 return ret;
62 case 8:
63 __put_user_size(*(u64 *)from, (u64 __user *)to,
64 8, ret, 8);
65 return ret;
62 } 66 }
63 } 67 }
64 return __copy_to_user_ll(to, from, n); 68 return __copy_to_user_ll(to, from, n);
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index f58a9c7a3c86..48d34d28f5a6 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -171,38 +171,17 @@ struct x86_platform_ops {
171}; 171};
172 172
173struct pci_dev; 173struct pci_dev;
174struct msi_msg;
175 174
176struct x86_msi_ops { 175struct x86_msi_ops {
177 int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type); 176 int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
178 void (*compose_msi_msg)(struct pci_dev *dev, unsigned int irq,
179 unsigned int dest, struct msi_msg *msg,
180 u8 hpet_id);
181 void (*teardown_msi_irq)(unsigned int irq); 177 void (*teardown_msi_irq)(unsigned int irq);
182 void (*teardown_msi_irqs)(struct pci_dev *dev); 178 void (*teardown_msi_irqs)(struct pci_dev *dev);
183 void (*restore_msi_irqs)(struct pci_dev *dev); 179 void (*restore_msi_irqs)(struct pci_dev *dev);
184 int (*setup_hpet_msi)(unsigned int irq, unsigned int id);
185}; 180};
186 181
187struct IO_APIC_route_entry;
188struct io_apic_irq_attr;
189struct irq_data;
190struct cpumask;
191
192struct x86_io_apic_ops { 182struct x86_io_apic_ops {
193 void (*init) (void);
194 unsigned int (*read) (unsigned int apic, unsigned int reg); 183 unsigned int (*read) (unsigned int apic, unsigned int reg);
195 void (*write) (unsigned int apic, unsigned int reg, unsigned int value);
196 void (*modify) (unsigned int apic, unsigned int reg, unsigned int value);
197 void (*disable)(void); 184 void (*disable)(void);
198 void (*print_entries)(unsigned int apic, unsigned int nr_entries);
199 int (*set_affinity)(struct irq_data *data,
200 const struct cpumask *mask,
201 bool force);
202 int (*setup_entry)(int irq, struct IO_APIC_route_entry *entry,
203 unsigned int destination, int vector,
204 struct io_apic_irq_attr *attr);
205 void (*eoi_ioapic_pin)(int apic, int pin, int vector);
206}; 185};
207 186
208extern struct x86_init_ops x86_init; 187extern struct x86_init_ops x86_init;
diff --git a/arch/x86/include/uapi/asm/msr.h b/arch/x86/include/uapi/asm/msr.h
index 155e51048fa4..c41f4fe25483 100644
--- a/arch/x86/include/uapi/asm/msr.h
+++ b/arch/x86/include/uapi/asm/msr.h
@@ -1,8 +1,6 @@
1#ifndef _UAPI_ASM_X86_MSR_H 1#ifndef _UAPI_ASM_X86_MSR_H
2#define _UAPI_ASM_X86_MSR_H 2#define _UAPI_ASM_X86_MSR_H
3 3
4#include <asm/msr-index.h>
5
6#ifndef __ASSEMBLY__ 4#ifndef __ASSEMBLY__
7 5
8#include <linux/types.h> 6#include <linux/types.h>
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb658c8f4..7528dcf59691 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
103#define MTRRIOC_GET_PAGE_ENTRY _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry) 103#define MTRRIOC_GET_PAGE_ENTRY _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
104#define MTRRIOC_KILL_PAGE_ENTRY _IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry) 104#define MTRRIOC_KILL_PAGE_ENTRY _IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry)
105 105
106/* These are the region types */ 106/* MTRR memory types, which are defined in SDM */
107#define MTRR_TYPE_UNCACHABLE 0 107#define MTRR_TYPE_UNCACHABLE 0
108#define MTRR_TYPE_WRCOMB 1 108#define MTRR_TYPE_WRCOMB 1
109/*#define MTRR_TYPE_ 2*/ 109/*#define MTRR_TYPE_ 2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
113#define MTRR_TYPE_WRBACK 6 113#define MTRR_TYPE_WRBACK 6
114#define MTRR_NUM_TYPES 7 114#define MTRR_NUM_TYPES 7
115 115
116/*
117 * Invalid MTRR memory type. mtrr_type_lookup() returns this value when
118 * MTRRs are disabled. Note, this value is allocated from the reserved
119 * values (0x7-0xff) of the MTRR memory types.
120 */
121#define MTRR_TYPE_INVALID 0xff
116 122
117#endif /* _UAPI_ASM_X86_MTRR_H */ 123#endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index febaf180621b..0f15af41bd80 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -22,7 +22,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o := n
22 22
23CFLAGS_irq.o := -I$(src)/../include/asm/trace 23CFLAGS_irq.o := -I$(src)/../include/asm/trace
24 24
25obj-y := process_$(BITS).o signal.o entry_$(BITS).o 25obj-y := process_$(BITS).o signal.o
26obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o 26obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
27obj-y += time.o ioport.o ldt.o dumpstack.o nmi.o 27obj-y += time.o ioport.o ldt.o dumpstack.o nmi.o
28obj-y += setup.o x86_init.o i8259.o irqinit.o jump_label.o 28obj-y += setup.o x86_init.o i8259.o irqinit.o jump_label.o
@@ -31,9 +31,6 @@ obj-y += probe_roms.o
31obj-$(CONFIG_X86_32) += i386_ksyms_32.o 31obj-$(CONFIG_X86_32) += i386_ksyms_32.o
32obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o 32obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
33obj-$(CONFIG_X86_64) += mcount_64.o 33obj-$(CONFIG_X86_64) += mcount_64.o
34obj-y += syscall_$(BITS).o vsyscall_gtod.o
35obj-$(CONFIG_IA32_EMULATION) += syscall_32.o
36obj-$(CONFIG_X86_VSYSCALL_EMULATION) += vsyscall_64.o vsyscall_emu_64.o
37obj-$(CONFIG_X86_ESPFIX64) += espfix_64.o 34obj-$(CONFIG_X86_ESPFIX64) += espfix_64.o
38obj-$(CONFIG_SYSFS) += ksysfs.o 35obj-$(CONFIG_SYSFS) += ksysfs.o
39obj-y += bootflag.o e820.o 36obj-y += bootflag.o e820.o
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index dbe76a14c3c9..e49ee24da85e 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -31,12 +31,12 @@
31#include <linux/module.h> 31#include <linux/module.h>
32#include <linux/dmi.h> 32#include <linux/dmi.h>
33#include <linux/irq.h> 33#include <linux/irq.h>
34#include <linux/irqdomain.h>
35#include <linux/slab.h> 34#include <linux/slab.h>
36#include <linux/bootmem.h> 35#include <linux/bootmem.h>
37#include <linux/ioport.h> 36#include <linux/ioport.h>
38#include <linux/pci.h> 37#include <linux/pci.h>
39 38
39#include <asm/irqdomain.h>
40#include <asm/pci_x86.h> 40#include <asm/pci_x86.h>
41#include <asm/pgtable.h> 41#include <asm/pgtable.h>
42#include <asm/io_apic.h> 42#include <asm/io_apic.h>
@@ -400,57 +400,13 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, int trigger,
400 return 0; 400 return 0;
401} 401}
402 402
403static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
404 int polarity)
405{
406 int irq, node;
407
408 if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
409 return gsi;
410
411 trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
412 polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
413 node = dev ? dev_to_node(dev) : NUMA_NO_NODE;
414 if (mp_set_gsi_attr(gsi, trigger, polarity, node)) {
415 pr_warn("Failed to set pin attr for GSI%d\n", gsi);
416 return -1;
417 }
418
419 irq = mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC);
420 if (irq < 0)
421 return irq;
422
423 /* Don't set up the ACPI SCI because it's already set up */
424 if (enable_update_mptable && acpi_gbl_FADT.sci_interrupt != gsi)
425 mp_config_acpi_gsi(dev, gsi, trigger, polarity);
426
427 return irq;
428}
429
430static void mp_unregister_gsi(u32 gsi)
431{
432 int irq;
433
434 if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
435 return;
436
437 irq = mp_map_gsi_to_irq(gsi, 0);
438 if (irq > 0)
439 mp_unmap_irq(irq);
440}
441
442static struct irq_domain_ops acpi_irqdomain_ops = {
443 .map = mp_irqdomain_map,
444 .unmap = mp_irqdomain_unmap,
445};
446
447static int __init 403static int __init
448acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) 404acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end)
449{ 405{
450 struct acpi_madt_io_apic *ioapic = NULL; 406 struct acpi_madt_io_apic *ioapic = NULL;
451 struct ioapic_domain_cfg cfg = { 407 struct ioapic_domain_cfg cfg = {
452 .type = IOAPIC_DOMAIN_DYNAMIC, 408 .type = IOAPIC_DOMAIN_DYNAMIC,
453 .ops = &acpi_irqdomain_ops, 409 .ops = &mp_ioapic_irqdomain_ops,
454 }; 410 };
455 411
456 ioapic = (struct acpi_madt_io_apic *)header; 412 ioapic = (struct acpi_madt_io_apic *)header;
@@ -652,7 +608,7 @@ static int acpi_register_gsi_pic(struct device *dev, u32 gsi,
652 * Make sure all (legacy) PCI IRQs are set as level-triggered. 608 * Make sure all (legacy) PCI IRQs are set as level-triggered.
653 */ 609 */
654 if (trigger == ACPI_LEVEL_SENSITIVE) 610 if (trigger == ACPI_LEVEL_SENSITIVE)
655 eisa_set_level_irq(gsi); 611 elcr_set_level_irq(gsi);
656#endif 612#endif
657 613
658 return gsi; 614 return gsi;
@@ -663,10 +619,21 @@ static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
663 int trigger, int polarity) 619 int trigger, int polarity)
664{ 620{
665 int irq = gsi; 621 int irq = gsi;
666
667#ifdef CONFIG_X86_IO_APIC 622#ifdef CONFIG_X86_IO_APIC
623 int node;
624 struct irq_alloc_info info;
625
626 node = dev ? dev_to_node(dev) : NUMA_NO_NODE;
627 trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
628 polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
629 ioapic_set_alloc_attr(&info, node, trigger, polarity);
630
668 mutex_lock(&acpi_ioapic_lock); 631 mutex_lock(&acpi_ioapic_lock);
669 irq = mp_register_gsi(dev, gsi, trigger, polarity); 632 irq = mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC, &info);
633 /* Don't set up the ACPI SCI because it's already set up */
634 if (irq >= 0 && enable_update_mptable &&
635 acpi_gbl_FADT.sci_interrupt != gsi)
636 mp_config_acpi_gsi(dev, gsi, trigger, polarity);
670 mutex_unlock(&acpi_ioapic_lock); 637 mutex_unlock(&acpi_ioapic_lock);
671#endif 638#endif
672 639
@@ -676,8 +643,12 @@ static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
676static void acpi_unregister_gsi_ioapic(u32 gsi) 643static void acpi_unregister_gsi_ioapic(u32 gsi)
677{ 644{
678#ifdef CONFIG_X86_IO_APIC 645#ifdef CONFIG_X86_IO_APIC
646 int irq;
647
679 mutex_lock(&acpi_ioapic_lock); 648 mutex_lock(&acpi_ioapic_lock);
680 mp_unregister_gsi(gsi); 649 irq = mp_map_gsi_to_irq(gsi, 0, NULL);
650 if (irq > 0)
651 mp_unmap_irq(irq);
681 mutex_unlock(&acpi_ioapic_lock); 652 mutex_unlock(&acpi_ioapic_lock);
682#endif 653#endif
683} 654}
@@ -786,7 +757,7 @@ int acpi_register_ioapic(acpi_handle handle, u64 phys_addr, u32 gsi_base)
786 u64 addr; 757 u64 addr;
787 struct ioapic_domain_cfg cfg = { 758 struct ioapic_domain_cfg cfg = {
788 .type = IOAPIC_DOMAIN_DYNAMIC, 759 .type = IOAPIC_DOMAIN_DYNAMIC,
789 .ops = &acpi_irqdomain_ops, 760 .ops = &mp_ioapic_irqdomain_ops,
790 }; 761 };
791 762
792 ioapic_id = acpi_get_ioapic_id(handle, gsi_base, &addr); 763 ioapic_id = acpi_get_ioapic_id(handle, gsi_base, &addr);
diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index ae693b51ed8e..8c35df468104 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -62,7 +62,7 @@ ENTRY(do_suspend_lowlevel)
62 pushfq 62 pushfq
63 popq pt_regs_flags(%rax) 63 popq pt_regs_flags(%rax)
64 64
65 movq $resume_point, saved_rip(%rip) 65 movq $.Lresume_point, saved_rip(%rip)
66 66
67 movq %rsp, saved_rsp 67 movq %rsp, saved_rsp
68 movq %rbp, saved_rbp 68 movq %rbp, saved_rbp
@@ -75,10 +75,10 @@ ENTRY(do_suspend_lowlevel)
75 xorl %eax, %eax 75 xorl %eax, %eax
76 call x86_acpi_enter_sleep_state 76 call x86_acpi_enter_sleep_state
77 /* in case something went wrong, restore the machine status and go on */ 77 /* in case something went wrong, restore the machine status and go on */
78 jmp resume_point 78 jmp .Lresume_point
79 79
80 .align 4 80 .align 4
81resume_point: 81.Lresume_point:
82 /* We don't restore %rax, it must be 0 anyway */ 82 /* We don't restore %rax, it must be 0 anyway */
83 movq $saved_context, %rax 83 movq $saved_context, %rax
84 movq saved_context_cr4(%rax), %rbx 84 movq saved_context_cr4(%rax), %rbx
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 7fe097235376..c42827eb86cf 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -231,6 +231,15 @@ void __init arch_init_ideal_nops(void)
231#endif 231#endif
232 } 232 }
233 break; 233 break;
234
235 case X86_VENDOR_AMD:
236 if (boot_cpu_data.x86 > 0xf) {
237 ideal_nops = p6_nops;
238 return;
239 }
240
241 /* fall through */
242
234 default: 243 default:
235#ifdef CONFIG_X86_64 244#ifdef CONFIG_X86_64
236 ideal_nops = k8_nops; 245 ideal_nops = k8_nops;
diff --git a/arch/x86/kernel/apb_timer.c b/arch/x86/kernel/apb_timer.c
index 6a7c23ff21d3..ede92c3364d3 100644
--- a/arch/x86/kernel/apb_timer.c
+++ b/arch/x86/kernel/apb_timer.c
@@ -171,10 +171,6 @@ static int __init apbt_clockevent_register(void)
171 171
172static void apbt_setup_irq(struct apbt_dev *adev) 172static void apbt_setup_irq(struct apbt_dev *adev)
173{ 173{
174 /* timer0 irq has been setup early */
175 if (adev->irq == 0)
176 return;
177
178 irq_modify_status(adev->irq, 0, IRQ_MOVE_PCNTXT); 174 irq_modify_status(adev->irq, 0, IRQ_MOVE_PCNTXT);
179 irq_set_affinity(adev->irq, cpumask_of(adev->cpu)); 175 irq_set_affinity(adev->irq, cpumask_of(adev->cpu));
180} 176}
diff --git a/arch/x86/kernel/apic/htirq.c b/arch/x86/kernel/apic/htirq.c
index 816f36e979ad..ae50d3454d78 100644
--- a/arch/x86/kernel/apic/htirq.c
+++ b/arch/x86/kernel/apic/htirq.c
@@ -3,6 +3,8 @@
3 * 3 *
4 * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo 4 * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo
5 * Moved from arch/x86/kernel/apic/io_apic.c. 5 * Moved from arch/x86/kernel/apic/io_apic.c.
6 * Jiang Liu <jiang.liu@linux.intel.com>
7 * Add support of hierarchical irqdomain
6 * 8 *
7 * This program is free software; you can redistribute it and/or modify 9 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 10 * it under the terms of the GNU General Public License version 2 as
@@ -14,78 +16,112 @@
14#include <linux/device.h> 16#include <linux/device.h>
15#include <linux/pci.h> 17#include <linux/pci.h>
16#include <linux/htirq.h> 18#include <linux/htirq.h>
19#include <asm/irqdomain.h>
17#include <asm/hw_irq.h> 20#include <asm/hw_irq.h>
18#include <asm/apic.h> 21#include <asm/apic.h>
19#include <asm/hypertransport.h> 22#include <asm/hypertransport.h>
20 23
24static struct irq_domain *htirq_domain;
25
21/* 26/*
22 * Hypertransport interrupt support 27 * Hypertransport interrupt support
23 */ 28 */
24static void target_ht_irq(unsigned int irq, unsigned int dest, u8 vector)
25{
26 struct ht_irq_msg msg;
27
28 fetch_ht_irq_msg(irq, &msg);
29
30 msg.address_lo &= ~(HT_IRQ_LOW_VECTOR_MASK | HT_IRQ_LOW_DEST_ID_MASK);
31 msg.address_hi &= ~(HT_IRQ_HIGH_DEST_ID_MASK);
32
33 msg.address_lo |= HT_IRQ_LOW_VECTOR(vector) | HT_IRQ_LOW_DEST_ID(dest);
34 msg.address_hi |= HT_IRQ_HIGH_DEST_ID(dest);
35
36 write_ht_irq_msg(irq, &msg);
37}
38
39static int 29static int
40ht_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force) 30ht_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
41{ 31{
42 struct irq_cfg *cfg = irqd_cfg(data); 32 struct irq_data *parent = data->parent_data;
43 unsigned int dest;
44 int ret; 33 int ret;
45 34
46 ret = apic_set_affinity(data, mask, &dest); 35 ret = parent->chip->irq_set_affinity(parent, mask, force);
47 if (ret) 36 if (ret >= 0) {
48 return ret; 37 struct ht_irq_msg msg;
49 38 struct irq_cfg *cfg = irqd_cfg(data);
50 target_ht_irq(data->irq, dest, cfg->vector); 39
51 return IRQ_SET_MASK_OK_NOCOPY; 40 fetch_ht_irq_msg(data->irq, &msg);
41 msg.address_lo &= ~(HT_IRQ_LOW_VECTOR_MASK |
42 HT_IRQ_LOW_DEST_ID_MASK);
43 msg.address_lo |= HT_IRQ_LOW_VECTOR(cfg->vector) |
44 HT_IRQ_LOW_DEST_ID(cfg->dest_apicid);
45 msg.address_hi &= ~(HT_IRQ_HIGH_DEST_ID_MASK);
46 msg.address_hi |= HT_IRQ_HIGH_DEST_ID(cfg->dest_apicid);
47 write_ht_irq_msg(data->irq, &msg);
48 }
49
50 return ret;
52} 51}
53 52
54static struct irq_chip ht_irq_chip = { 53static struct irq_chip ht_irq_chip = {
55 .name = "PCI-HT", 54 .name = "PCI-HT",
56 .irq_mask = mask_ht_irq, 55 .irq_mask = mask_ht_irq,
57 .irq_unmask = unmask_ht_irq, 56 .irq_unmask = unmask_ht_irq,
58 .irq_ack = apic_ack_edge, 57 .irq_ack = irq_chip_ack_parent,
59 .irq_set_affinity = ht_set_affinity, 58 .irq_set_affinity = ht_set_affinity,
60 .irq_retrigger = apic_retrigger_irq, 59 .irq_retrigger = irq_chip_retrigger_hierarchy,
61 .flags = IRQCHIP_SKIP_SET_WAKE, 60 .flags = IRQCHIP_SKIP_SET_WAKE,
62}; 61};
63 62
64int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev) 63static int htirq_domain_alloc(struct irq_domain *domain, unsigned int virq,
64 unsigned int nr_irqs, void *arg)
65{ 65{
66 struct irq_cfg *cfg; 66 struct ht_irq_cfg *ht_cfg;
67 struct ht_irq_msg msg; 67 struct irq_alloc_info *info = arg;
68 unsigned dest; 68 struct pci_dev *dev;
69 int err; 69 irq_hw_number_t hwirq;
70 int ret;
70 71
71 if (disable_apic) 72 if (nr_irqs > 1 || !info)
72 return -ENXIO; 73 return -EINVAL;
73 74
74 cfg = irq_cfg(irq); 75 dev = info->ht_dev;
75 err = assign_irq_vector(irq, cfg, apic->target_cpus()); 76 hwirq = (info->ht_idx & 0xFF) |
76 if (err) 77 PCI_DEVID(dev->bus->number, dev->devfn) << 8 |
77 return err; 78 (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 24;
79 if (irq_find_mapping(domain, hwirq) > 0)
80 return -EEXIST;
78 81
79 err = apic->cpu_mask_to_apicid_and(cfg->domain, 82 ht_cfg = kmalloc(sizeof(*ht_cfg), GFP_KERNEL);
80 apic->target_cpus(), &dest); 83 if (!ht_cfg)
81 if (err) 84 return -ENOMEM;
82 return err;
83 85
84 msg.address_hi = HT_IRQ_HIGH_DEST_ID(dest); 86 ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
87 if (ret < 0) {
88 kfree(ht_cfg);
89 return ret;
90 }
91
92 /* Initialize msg to a value that will never match the first write. */
93 ht_cfg->msg.address_lo = 0xffffffff;
94 ht_cfg->msg.address_hi = 0xffffffff;
95 ht_cfg->dev = info->ht_dev;
96 ht_cfg->update = info->ht_update;
97 ht_cfg->pos = info->ht_pos;
98 ht_cfg->idx = 0x10 + (info->ht_idx * 2);
99 irq_domain_set_info(domain, virq, hwirq, &ht_irq_chip, ht_cfg,
100 handle_edge_irq, ht_cfg, "edge");
101
102 return 0;
103}
104
105static void htirq_domain_free(struct irq_domain *domain, unsigned int virq,
106 unsigned int nr_irqs)
107{
108 struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
109
110 BUG_ON(nr_irqs != 1);
111 kfree(irq_data->chip_data);
112 irq_domain_free_irqs_top(domain, virq, nr_irqs);
113}
85 114
115static void htirq_domain_activate(struct irq_domain *domain,
116 struct irq_data *irq_data)
117{
118 struct ht_irq_msg msg;
119 struct irq_cfg *cfg = irqd_cfg(irq_data);
120
121 msg.address_hi = HT_IRQ_HIGH_DEST_ID(cfg->dest_apicid);
86 msg.address_lo = 122 msg.address_lo =
87 HT_IRQ_LOW_BASE | 123 HT_IRQ_LOW_BASE |
88 HT_IRQ_LOW_DEST_ID(dest) | 124 HT_IRQ_LOW_DEST_ID(cfg->dest_apicid) |
89 HT_IRQ_LOW_VECTOR(cfg->vector) | 125 HT_IRQ_LOW_VECTOR(cfg->vector) |
90 ((apic->irq_dest_mode == 0) ? 126 ((apic->irq_dest_mode == 0) ?
91 HT_IRQ_LOW_DM_PHYSICAL : 127 HT_IRQ_LOW_DM_PHYSICAL :
@@ -95,13 +131,56 @@ int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
95 HT_IRQ_LOW_MT_FIXED : 131 HT_IRQ_LOW_MT_FIXED :
96 HT_IRQ_LOW_MT_ARBITRATED) | 132 HT_IRQ_LOW_MT_ARBITRATED) |
97 HT_IRQ_LOW_IRQ_MASKED; 133 HT_IRQ_LOW_IRQ_MASKED;
134 write_ht_irq_msg(irq_data->irq, &msg);
135}
98 136
99 write_ht_irq_msg(irq, &msg); 137static void htirq_domain_deactivate(struct irq_domain *domain,
138 struct irq_data *irq_data)
139{
140 struct ht_irq_msg msg;
100 141
101 irq_set_chip_and_handler_name(irq, &ht_irq_chip, 142 memset(&msg, 0, sizeof(msg));
102 handle_edge_irq, "edge"); 143 write_ht_irq_msg(irq_data->irq, &msg);
144}
103 145
104 dev_dbg(&dev->dev, "irq %d for HT\n", irq); 146static const struct irq_domain_ops htirq_domain_ops = {
147 .alloc = htirq_domain_alloc,
148 .free = htirq_domain_free,
149 .activate = htirq_domain_activate,
150 .deactivate = htirq_domain_deactivate,
151};
105 152
106 return 0; 153void arch_init_htirq_domain(struct irq_domain *parent)
154{
155 if (disable_apic)
156 return;
157
158 htirq_domain = irq_domain_add_tree(NULL, &htirq_domain_ops, NULL);
159 if (!htirq_domain)
160 pr_warn("failed to initialize irqdomain for HTIRQ.\n");
161 else
162 htirq_domain->parent = parent;
163}
164
165int arch_setup_ht_irq(int idx, int pos, struct pci_dev *dev,
166 ht_irq_update_t *update)
167{
168 struct irq_alloc_info info;
169
170 if (!htirq_domain)
171 return -ENOSYS;
172
173 init_irq_alloc_info(&info, NULL);
174 info.ht_idx = idx;
175 info.ht_pos = pos;
176 info.ht_dev = dev;
177 info.ht_update = update;
178
179 return irq_domain_alloc_irqs(htirq_domain, 1, dev_to_node(&dev->dev),
180 &info);
181}
182
183void arch_teardown_ht_irq(unsigned int irq)
184{
185 irq_domain_free_irqs(irq, 1);
107} 186}
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index f4dc2462a1ac..845dc0df2002 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -18,6 +18,16 @@
18 * and Rolf G. Tews 18 * and Rolf G. Tews
19 * for testing these extensively 19 * for testing these extensively
20 * Paul Diefenbaugh : Added full ACPI support 20 * Paul Diefenbaugh : Added full ACPI support
21 *
22 * Historical information which is worth to be preserved:
23 *
24 * - SiS APIC rmw bug:
25 *
26 * We used to have a workaround for a bug in SiS chips which
27 * required to rewrite the index register for a read-modify-write
28 * operation as the chip lost the index information which was
29 * setup for the read already. We cache the data now, so that
30 * workaround has been removed.
21 */ 31 */
22 32
23#include <linux/mm.h> 33#include <linux/mm.h>
@@ -31,13 +41,13 @@
31#include <linux/acpi.h> 41#include <linux/acpi.h>
32#include <linux/module.h> 42#include <linux/module.h>
33#include <linux/syscore_ops.h> 43#include <linux/syscore_ops.h>
34#include <linux/irqdomain.h>
35#include <linux/freezer.h> 44#include <linux/freezer.h>
36#include <linux/kthread.h> 45#include <linux/kthread.h>
37#include <linux/jiffies.h> /* time_after() */ 46#include <linux/jiffies.h> /* time_after() */
38#include <linux/slab.h> 47#include <linux/slab.h>
39#include <linux/bootmem.h> 48#include <linux/bootmem.h>
40 49
50#include <asm/irqdomain.h>
41#include <asm/idle.h> 51#include <asm/idle.h>
42#include <asm/io.h> 52#include <asm/io.h>
43#include <asm/smp.h> 53#include <asm/smp.h>
@@ -63,27 +73,31 @@
63#define for_each_ioapic_pin(idx, pin) \ 73#define for_each_ioapic_pin(idx, pin) \
64 for_each_ioapic((idx)) \ 74 for_each_ioapic((idx)) \
65 for_each_pin((idx), (pin)) 75 for_each_pin((idx), (pin))
66
67#define for_each_irq_pin(entry, head) \ 76#define for_each_irq_pin(entry, head) \
68 list_for_each_entry(entry, &head, list) 77 list_for_each_entry(entry, &head, list)
69 78
70/*
71 * Is the SiS APIC rmw bug present ?
72 * -1 = don't know, 0 = no, 1 = yes
73 */
74int sis_apic_bug = -1;
75
76static DEFINE_RAW_SPINLOCK(ioapic_lock); 79static DEFINE_RAW_SPINLOCK(ioapic_lock);
77static DEFINE_MUTEX(ioapic_mutex); 80static DEFINE_MUTEX(ioapic_mutex);
78static unsigned int ioapic_dynirq_base; 81static unsigned int ioapic_dynirq_base;
79static int ioapic_initialized; 82static int ioapic_initialized;
80 83
81struct mp_pin_info { 84struct irq_pin_list {
85 struct list_head list;
86 int apic, pin;
87};
88
89struct mp_chip_data {
90 struct list_head irq_2_pin;
91 struct IO_APIC_route_entry entry;
82 int trigger; 92 int trigger;
83 int polarity; 93 int polarity;
84 int node;
85 int set;
86 u32 count; 94 u32 count;
95 bool isa_irq;
96};
97
98struct mp_ioapic_gsi {
99 u32 gsi_base;
100 u32 gsi_end;
87}; 101};
88 102
89static struct ioapic { 103static struct ioapic {
@@ -101,7 +115,6 @@ static struct ioapic {
101 struct mp_ioapic_gsi gsi_config; 115 struct mp_ioapic_gsi gsi_config;
102 struct ioapic_domain_cfg irqdomain_cfg; 116 struct ioapic_domain_cfg irqdomain_cfg;
103 struct irq_domain *irqdomain; 117 struct irq_domain *irqdomain;
104 struct mp_pin_info *pin_info;
105 struct resource *iomem_res; 118 struct resource *iomem_res;
106} ioapics[MAX_IO_APICS]; 119} ioapics[MAX_IO_APICS];
107 120
@@ -117,7 +130,7 @@ unsigned int mpc_ioapic_addr(int ioapic_idx)
117 return ioapics[ioapic_idx].mp_config.apicaddr; 130 return ioapics[ioapic_idx].mp_config.apicaddr;
118} 131}
119 132
120struct mp_ioapic_gsi *mp_ioapic_gsi_routing(int ioapic_idx) 133static inline struct mp_ioapic_gsi *mp_ioapic_gsi_routing(int ioapic_idx)
121{ 134{
122 return &ioapics[ioapic_idx].gsi_config; 135 return &ioapics[ioapic_idx].gsi_config;
123} 136}
@@ -129,11 +142,16 @@ static inline int mp_ioapic_pin_count(int ioapic)
129 return gsi_cfg->gsi_end - gsi_cfg->gsi_base + 1; 142 return gsi_cfg->gsi_end - gsi_cfg->gsi_base + 1;
130} 143}
131 144
132u32 mp_pin_to_gsi(int ioapic, int pin) 145static inline u32 mp_pin_to_gsi(int ioapic, int pin)
133{ 146{
134 return mp_ioapic_gsi_routing(ioapic)->gsi_base + pin; 147 return mp_ioapic_gsi_routing(ioapic)->gsi_base + pin;
135} 148}
136 149
150static inline bool mp_is_legacy_irq(int irq)
151{
152 return irq >= 0 && irq < nr_legacy_irqs();
153}
154
137/* 155/*
138 * Initialize all legacy IRQs and all pins on the first IOAPIC 156 * Initialize all legacy IRQs and all pins on the first IOAPIC
139 * if we have legacy interrupt controller. Kernel boot option "pirq=" 157 * if we have legacy interrupt controller. Kernel boot option "pirq="
@@ -144,12 +162,7 @@ static inline int mp_init_irq_at_boot(int ioapic, int irq)
144 if (!nr_legacy_irqs()) 162 if (!nr_legacy_irqs())
145 return 0; 163 return 0;
146 164
147 return ioapic == 0 || (irq >= 0 && irq < nr_legacy_irqs()); 165 return ioapic == 0 || mp_is_legacy_irq(irq);
148}
149
150static inline struct mp_pin_info *mp_pin_info(int ioapic_idx, int pin)
151{
152 return ioapics[ioapic_idx].pin_info + pin;
153} 166}
154 167
155static inline struct irq_domain *mp_ioapic_irqdomain(int ioapic) 168static inline struct irq_domain *mp_ioapic_irqdomain(int ioapic)
@@ -216,16 +229,6 @@ void mp_save_irq(struct mpc_intsrc *m)
216 panic("Max # of irq sources exceeded!!\n"); 229 panic("Max # of irq sources exceeded!!\n");
217} 230}
218 231
219struct irq_pin_list {
220 struct list_head list;
221 int apic, pin;
222};
223
224static struct irq_pin_list *alloc_irq_pin_list(int node)
225{
226 return kzalloc_node(sizeof(struct irq_pin_list), GFP_KERNEL, node);
227}
228
229static void alloc_ioapic_saved_registers(int idx) 232static void alloc_ioapic_saved_registers(int idx)
230{ 233{
231 size_t size; 234 size_t size;
@@ -247,8 +250,7 @@ static void free_ioapic_saved_registers(int idx)
247 250
248int __init arch_early_ioapic_init(void) 251int __init arch_early_ioapic_init(void)
249{ 252{
250 struct irq_cfg *cfg; 253 int i;
251 int i, node = cpu_to_node(0);
252 254
253 if (!nr_legacy_irqs()) 255 if (!nr_legacy_irqs())
254 io_apic_irqs = ~0UL; 256 io_apic_irqs = ~0UL;
@@ -256,16 +258,6 @@ int __init arch_early_ioapic_init(void)
256 for_each_ioapic(i) 258 for_each_ioapic(i)
257 alloc_ioapic_saved_registers(i); 259 alloc_ioapic_saved_registers(i);
258 260
259 /*
260 * For legacy IRQ's, start with assigning irq0 to irq15 to
261 * IRQ0_VECTOR to IRQ15_VECTOR for all cpu's.
262 */
263 for (i = 0; i < nr_legacy_irqs(); i++) {
264 cfg = alloc_irq_and_cfg_at(i, node);
265 cfg->vector = IRQ0_VECTOR + i;
266 cpumask_setall(cfg->domain);
267 }
268
269 return 0; 261 return 0;
270} 262}
271 263
@@ -283,7 +275,7 @@ static __attribute_const__ struct io_apic __iomem *io_apic_base(int idx)
283 + (mpc_ioapic_addr(idx) & ~PAGE_MASK); 275 + (mpc_ioapic_addr(idx) & ~PAGE_MASK);
284} 276}
285 277
286void io_apic_eoi(unsigned int apic, unsigned int vector) 278static inline void io_apic_eoi(unsigned int apic, unsigned int vector)
287{ 279{
288 struct io_apic __iomem *io_apic = io_apic_base(apic); 280 struct io_apic __iomem *io_apic = io_apic_base(apic);
289 writel(vector, &io_apic->eoi); 281 writel(vector, &io_apic->eoi);
@@ -296,7 +288,8 @@ unsigned int native_io_apic_read(unsigned int apic, unsigned int reg)
296 return readl(&io_apic->data); 288 return readl(&io_apic->data);
297} 289}
298 290
299void native_io_apic_write(unsigned int apic, unsigned int reg, unsigned int value) 291static void io_apic_write(unsigned int apic, unsigned int reg,
292 unsigned int value)
300{ 293{
301 struct io_apic __iomem *io_apic = io_apic_base(apic); 294 struct io_apic __iomem *io_apic = io_apic_base(apic);
302 295
@@ -304,21 +297,6 @@ void native_io_apic_write(unsigned int apic, unsigned int reg, unsigned int valu
304 writel(value, &io_apic->data); 297 writel(value, &io_apic->data);
305} 298}
306 299
307/*
308 * Re-write a value: to be used for read-modify-write
309 * cycles where the read already set up the index register.
310 *
311 * Older SiS APIC requires we rewrite the index register
312 */
313void native_io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
314{
315 struct io_apic __iomem *io_apic = io_apic_base(apic);
316
317 if (sis_apic_bug)
318 writel(reg, &io_apic->index);
319 writel(value, &io_apic->data);
320}
321
322union entry_union { 300union entry_union {
323 struct { u32 w1, w2; }; 301 struct { u32 w1, w2; };
324 struct IO_APIC_route_entry entry; 302 struct IO_APIC_route_entry entry;
@@ -378,7 +356,7 @@ static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
378static void ioapic_mask_entry(int apic, int pin) 356static void ioapic_mask_entry(int apic, int pin)
379{ 357{
380 unsigned long flags; 358 unsigned long flags;
381 union entry_union eu = { .entry.mask = 1 }; 359 union entry_union eu = { .entry.mask = IOAPIC_MASKED };
382 360
383 raw_spin_lock_irqsave(&ioapic_lock, flags); 361 raw_spin_lock_irqsave(&ioapic_lock, flags);
384 io_apic_write(apic, 0x10 + 2*pin, eu.w1); 362 io_apic_write(apic, 0x10 + 2*pin, eu.w1);
@@ -391,16 +369,17 @@ static void ioapic_mask_entry(int apic, int pin)
391 * shared ISA-space IRQs, so we have to support them. We are super 369 * shared ISA-space IRQs, so we have to support them. We are super
392 * fast in the common case, and fast for shared ISA-space IRQs. 370 * fast in the common case, and fast for shared ISA-space IRQs.
393 */ 371 */
394static int __add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pin) 372static int __add_pin_to_irq_node(struct mp_chip_data *data,
373 int node, int apic, int pin)
395{ 374{
396 struct irq_pin_list *entry; 375 struct irq_pin_list *entry;
397 376
398 /* don't allow duplicates */ 377 /* don't allow duplicates */
399 for_each_irq_pin(entry, cfg->irq_2_pin) 378 for_each_irq_pin(entry, data->irq_2_pin)
400 if (entry->apic == apic && entry->pin == pin) 379 if (entry->apic == apic && entry->pin == pin)
401 return 0; 380 return 0;
402 381
403 entry = alloc_irq_pin_list(node); 382 entry = kzalloc_node(sizeof(struct irq_pin_list), GFP_ATOMIC, node);
404 if (!entry) { 383 if (!entry) {
405 pr_err("can not alloc irq_pin_list (%d,%d,%d)\n", 384 pr_err("can not alloc irq_pin_list (%d,%d,%d)\n",
406 node, apic, pin); 385 node, apic, pin);
@@ -408,16 +387,16 @@ static int __add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pi
408 } 387 }
409 entry->apic = apic; 388 entry->apic = apic;
410 entry->pin = pin; 389 entry->pin = pin;
390 list_add_tail(&entry->list, &data->irq_2_pin);
411 391
412 list_add_tail(&entry->list, &cfg->irq_2_pin);
413 return 0; 392 return 0;
414} 393}
415 394
416static void __remove_pin_from_irq(struct irq_cfg *cfg, int apic, int pin) 395static void __remove_pin_from_irq(struct mp_chip_data *data, int apic, int pin)
417{ 396{
418 struct irq_pin_list *tmp, *entry; 397 struct irq_pin_list *tmp, *entry;
419 398
420 list_for_each_entry_safe(entry, tmp, &cfg->irq_2_pin, list) 399 list_for_each_entry_safe(entry, tmp, &data->irq_2_pin, list)
421 if (entry->apic == apic && entry->pin == pin) { 400 if (entry->apic == apic && entry->pin == pin) {
422 list_del(&entry->list); 401 list_del(&entry->list);
423 kfree(entry); 402 kfree(entry);
@@ -425,22 +404,23 @@ static void __remove_pin_from_irq(struct irq_cfg *cfg, int apic, int pin)
425 } 404 }
426} 405}
427 406
428static void add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pin) 407static void add_pin_to_irq_node(struct mp_chip_data *data,
408 int node, int apic, int pin)
429{ 409{
430 if (__add_pin_to_irq_node(cfg, node, apic, pin)) 410 if (__add_pin_to_irq_node(data, node, apic, pin))
431 panic("IO-APIC: failed to add irq-pin. Can not proceed\n"); 411 panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
432} 412}
433 413
434/* 414/*
435 * Reroute an IRQ to a different pin. 415 * Reroute an IRQ to a different pin.
436 */ 416 */
437static void __init replace_pin_at_irq_node(struct irq_cfg *cfg, int node, 417static void __init replace_pin_at_irq_node(struct mp_chip_data *data, int node,
438 int oldapic, int oldpin, 418 int oldapic, int oldpin,
439 int newapic, int newpin) 419 int newapic, int newpin)
440{ 420{
441 struct irq_pin_list *entry; 421 struct irq_pin_list *entry;
442 422
443 for_each_irq_pin(entry, cfg->irq_2_pin) { 423 for_each_irq_pin(entry, data->irq_2_pin) {
444 if (entry->apic == oldapic && entry->pin == oldpin) { 424 if (entry->apic == oldapic && entry->pin == oldpin) {
445 entry->apic = newapic; 425 entry->apic = newapic;
446 entry->pin = newpin; 426 entry->pin = newpin;
@@ -450,32 +430,26 @@ static void __init replace_pin_at_irq_node(struct irq_cfg *cfg, int node,
450 } 430 }
451 431
452 /* old apic/pin didn't exist, so just add new ones */ 432 /* old apic/pin didn't exist, so just add new ones */
453 add_pin_to_irq_node(cfg, node, newapic, newpin); 433 add_pin_to_irq_node(data, node, newapic, newpin);
454}
455
456static void __io_apic_modify_irq(struct irq_pin_list *entry,
457 int mask_and, int mask_or,
458 void (*final)(struct irq_pin_list *entry))
459{
460 unsigned int reg, pin;
461
462 pin = entry->pin;
463 reg = io_apic_read(entry->apic, 0x10 + pin * 2);
464 reg &= mask_and;
465 reg |= mask_or;
466 io_apic_modify(entry->apic, 0x10 + pin * 2, reg);
467 if (final)
468 final(entry);
469} 434}
470 435
471static void io_apic_modify_irq(struct irq_cfg *cfg, 436static void io_apic_modify_irq(struct mp_chip_data *data,
472 int mask_and, int mask_or, 437 int mask_and, int mask_or,
473 void (*final)(struct irq_pin_list *entry)) 438 void (*final)(struct irq_pin_list *entry))
474{ 439{
440 union entry_union eu;
475 struct irq_pin_list *entry; 441 struct irq_pin_list *entry;
476 442
477 for_each_irq_pin(entry, cfg->irq_2_pin) 443 eu.entry = data->entry;
478 __io_apic_modify_irq(entry, mask_and, mask_or, final); 444 eu.w1 &= mask_and;
445 eu.w1 |= mask_or;
446 data->entry = eu.entry;
447
448 for_each_irq_pin(entry, data->irq_2_pin) {
449 io_apic_write(entry->apic, 0x10 + 2 * entry->pin, eu.w1);
450 if (final)
451 final(entry);
452 }
479} 453}
480 454
481static void io_apic_sync(struct irq_pin_list *entry) 455static void io_apic_sync(struct irq_pin_list *entry)
@@ -490,39 +464,31 @@ static void io_apic_sync(struct irq_pin_list *entry)
490 readl(&io_apic->data); 464 readl(&io_apic->data);
491} 465}
492 466
493static void mask_ioapic(struct irq_cfg *cfg) 467static void mask_ioapic_irq(struct irq_data *irq_data)
494{ 468{
469 struct mp_chip_data *data = irq_data->chip_data;
495 unsigned long flags; 470 unsigned long flags;
496 471
497 raw_spin_lock_irqsave(&ioapic_lock, flags); 472 raw_spin_lock_irqsave(&ioapic_lock, flags);
498 io_apic_modify_irq(cfg, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync); 473 io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
499 raw_spin_unlock_irqrestore(&ioapic_lock, flags); 474 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
500} 475}
501 476
502static void mask_ioapic_irq(struct irq_data *data) 477static void __unmask_ioapic(struct mp_chip_data *data)
503{ 478{
504 mask_ioapic(irqd_cfg(data)); 479 io_apic_modify_irq(data, ~IO_APIC_REDIR_MASKED, 0, NULL);
505} 480}
506 481
507static void __unmask_ioapic(struct irq_cfg *cfg) 482static void unmask_ioapic_irq(struct irq_data *irq_data)
508{
509 io_apic_modify_irq(cfg, ~IO_APIC_REDIR_MASKED, 0, NULL);
510}
511
512static void unmask_ioapic(struct irq_cfg *cfg)
513{ 483{
484 struct mp_chip_data *data = irq_data->chip_data;
514 unsigned long flags; 485 unsigned long flags;
515 486
516 raw_spin_lock_irqsave(&ioapic_lock, flags); 487 raw_spin_lock_irqsave(&ioapic_lock, flags);
517 __unmask_ioapic(cfg); 488 __unmask_ioapic(data);
518 raw_spin_unlock_irqrestore(&ioapic_lock, flags); 489 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
519} 490}
520 491
521static void unmask_ioapic_irq(struct irq_data *data)
522{
523 unmask_ioapic(irqd_cfg(data));
524}
525
526/* 492/*
527 * IO-APIC versions below 0x20 don't support EOI register. 493 * IO-APIC versions below 0x20 don't support EOI register.
528 * For the record, here is the information about various versions: 494 * For the record, here is the information about various versions:
@@ -539,7 +505,7 @@ static void unmask_ioapic_irq(struct irq_data *data)
539 * Otherwise, we simulate the EOI message manually by changing the trigger 505 * Otherwise, we simulate the EOI message manually by changing the trigger
540 * mode to edge and then back to level, with RTE being masked during this. 506 * mode to edge and then back to level, with RTE being masked during this.
541 */ 507 */
542void native_eoi_ioapic_pin(int apic, int pin, int vector) 508static void __eoi_ioapic_pin(int apic, int pin, int vector)
543{ 509{
544 if (mpc_ioapic_ver(apic) >= 0x20) { 510 if (mpc_ioapic_ver(apic) >= 0x20) {
545 io_apic_eoi(apic, vector); 511 io_apic_eoi(apic, vector);
@@ -551,7 +517,7 @@ void native_eoi_ioapic_pin(int apic, int pin, int vector)
551 /* 517 /*
552 * Mask the entry and change the trigger mode to edge. 518 * Mask the entry and change the trigger mode to edge.
553 */ 519 */
554 entry1.mask = 1; 520 entry1.mask = IOAPIC_MASKED;
555 entry1.trigger = IOAPIC_EDGE; 521 entry1.trigger = IOAPIC_EDGE;
556 522
557 __ioapic_write_entry(apic, pin, entry1); 523 __ioapic_write_entry(apic, pin, entry1);
@@ -563,15 +529,14 @@ void native_eoi_ioapic_pin(int apic, int pin, int vector)
563 } 529 }
564} 530}
565 531
566void eoi_ioapic_irq(unsigned int irq, struct irq_cfg *cfg) 532void eoi_ioapic_pin(int vector, struct mp_chip_data *data)
567{ 533{
568 struct irq_pin_list *entry;
569 unsigned long flags; 534 unsigned long flags;
535 struct irq_pin_list *entry;
570 536
571 raw_spin_lock_irqsave(&ioapic_lock, flags); 537 raw_spin_lock_irqsave(&ioapic_lock, flags);
572 for_each_irq_pin(entry, cfg->irq_2_pin) 538 for_each_irq_pin(entry, data->irq_2_pin)
573 x86_io_apic_ops.eoi_ioapic_pin(entry->apic, entry->pin, 539 __eoi_ioapic_pin(entry->apic, entry->pin, vector);
574 cfg->vector);
575 raw_spin_unlock_irqrestore(&ioapic_lock, flags); 540 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
576} 541}
577 542
@@ -588,8 +553,8 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
588 * Make sure the entry is masked and re-read the contents to check 553 * Make sure the entry is masked and re-read the contents to check
589 * if it is a level triggered pin and if the remote-IRR is set. 554 * if it is a level triggered pin and if the remote-IRR is set.
590 */ 555 */
591 if (!entry.mask) { 556 if (entry.mask == IOAPIC_UNMASKED) {
592 entry.mask = 1; 557 entry.mask = IOAPIC_MASKED;
593 ioapic_write_entry(apic, pin, entry); 558 ioapic_write_entry(apic, pin, entry);
594 entry = ioapic_read_entry(apic, pin); 559 entry = ioapic_read_entry(apic, pin);
595 } 560 }
@@ -602,13 +567,12 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
602 * doesn't clear the remote-IRR if the trigger mode is not 567 * doesn't clear the remote-IRR if the trigger mode is not
603 * set to level. 568 * set to level.
604 */ 569 */
605 if (!entry.trigger) { 570 if (entry.trigger == IOAPIC_EDGE) {
606 entry.trigger = IOAPIC_LEVEL; 571 entry.trigger = IOAPIC_LEVEL;
607 ioapic_write_entry(apic, pin, entry); 572 ioapic_write_entry(apic, pin, entry);
608 } 573 }
609
610 raw_spin_lock_irqsave(&ioapic_lock, flags); 574 raw_spin_lock_irqsave(&ioapic_lock, flags);
611 x86_io_apic_ops.eoi_ioapic_pin(apic, pin, entry.vector); 575 __eoi_ioapic_pin(apic, pin, entry.vector);
612 raw_spin_unlock_irqrestore(&ioapic_lock, flags); 576 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
613 } 577 }
614 578
@@ -706,8 +670,8 @@ void mask_ioapic_entries(void)
706 struct IO_APIC_route_entry entry; 670 struct IO_APIC_route_entry entry;
707 671
708 entry = ioapics[apic].saved_registers[pin]; 672 entry = ioapics[apic].saved_registers[pin];
709 if (!entry.mask) { 673 if (entry.mask == IOAPIC_UNMASKED) {
710 entry.mask = 1; 674 entry.mask = IOAPIC_MASKED;
711 ioapic_write_entry(apic, pin, entry); 675 ioapic_write_entry(apic, pin, entry);
712 } 676 }
713 } 677 }
@@ -809,11 +773,11 @@ static int EISA_ELCR(unsigned int irq)
809 773
810#endif 774#endif
811 775
812/* ISA interrupts are always polarity zero edge triggered, 776/* ISA interrupts are always active high edge triggered,
813 * when listed as conforming in the MP table. */ 777 * when listed as conforming in the MP table. */
814 778
815#define default_ISA_trigger(idx) (0) 779#define default_ISA_trigger(idx) (IOAPIC_EDGE)
816#define default_ISA_polarity(idx) (0) 780#define default_ISA_polarity(idx) (IOAPIC_POL_HIGH)
817 781
818/* EISA interrupts are always polarity zero and can be edge or level 782/* EISA interrupts are always polarity zero and can be edge or level
819 * trigger depending on the ELCR value. If an interrupt is listed as 783 * trigger depending on the ELCR value. If an interrupt is listed as
@@ -823,53 +787,55 @@ static int EISA_ELCR(unsigned int irq)
823#define default_EISA_trigger(idx) (EISA_ELCR(mp_irqs[idx].srcbusirq)) 787#define default_EISA_trigger(idx) (EISA_ELCR(mp_irqs[idx].srcbusirq))
824#define default_EISA_polarity(idx) default_ISA_polarity(idx) 788#define default_EISA_polarity(idx) default_ISA_polarity(idx)
825 789
826/* PCI interrupts are always polarity one level triggered, 790/* PCI interrupts are always active low level triggered,
827 * when listed as conforming in the MP table. */ 791 * when listed as conforming in the MP table. */
828 792
829#define default_PCI_trigger(idx) (1) 793#define default_PCI_trigger(idx) (IOAPIC_LEVEL)
830#define default_PCI_polarity(idx) (1) 794#define default_PCI_polarity(idx) (IOAPIC_POL_LOW)
831 795
832static int irq_polarity(int idx) 796static int irq_polarity(int idx)
833{ 797{
834 int bus = mp_irqs[idx].srcbus; 798 int bus = mp_irqs[idx].srcbus;
835 int polarity;
836 799
837 /* 800 /*
838 * Determine IRQ line polarity (high active or low active): 801 * Determine IRQ line polarity (high active or low active):
839 */ 802 */
840 switch (mp_irqs[idx].irqflag & 3) 803 switch (mp_irqs[idx].irqflag & 0x03) {
841 { 804 case 0:
842 case 0: /* conforms, ie. bus-type dependent polarity */ 805 /* conforms to spec, ie. bus-type dependent polarity */
843 if (test_bit(bus, mp_bus_not_pci)) 806 if (test_bit(bus, mp_bus_not_pci))
844 polarity = default_ISA_polarity(idx); 807 return default_ISA_polarity(idx);
845 else 808 else
846 polarity = default_PCI_polarity(idx); 809 return default_PCI_polarity(idx);
847 break; 810 case 1:
848 case 1: /* high active */ 811 return IOAPIC_POL_HIGH;
849 { 812 case 2:
850 polarity = 0; 813 pr_warn("IOAPIC: Invalid polarity: 2, defaulting to low\n");
851 break; 814 case 3:
852 } 815 default: /* Pointless default required due to do gcc stupidity */
853 case 2: /* reserved */ 816 return IOAPIC_POL_LOW;
854 { 817 }
855 pr_warn("broken BIOS!!\n"); 818}
856 polarity = 1; 819
857 break; 820#ifdef CONFIG_EISA
858 } 821static int eisa_irq_trigger(int idx, int bus, int trigger)
859 case 3: /* low active */ 822{
860 { 823 switch (mp_bus_id_to_type[bus]) {
861 polarity = 1; 824 case MP_BUS_PCI:
862 break; 825 case MP_BUS_ISA:
863 } 826 return trigger;
864 default: /* invalid */ 827 case MP_BUS_EISA:
865 { 828 return default_EISA_trigger(idx);
866 pr_warn("broken BIOS!!\n");
867 polarity = 1;
868 break;
869 }
870 } 829 }
871 return polarity; 830 pr_warn("IOAPIC: Invalid srcbus: %d defaulting to level\n", bus);
831 return IOAPIC_LEVEL;
872} 832}
833#else
834static inline int eisa_irq_trigger(int idx, int bus, int trigger)
835{
836 return trigger;
837}
838#endif
873 839
874static int irq_trigger(int idx) 840static int irq_trigger(int idx)
875{ 841{
@@ -879,153 +845,227 @@ static int irq_trigger(int idx)
879 /* 845 /*
880 * Determine IRQ trigger mode (edge or level sensitive): 846 * Determine IRQ trigger mode (edge or level sensitive):
881 */ 847 */
882 switch ((mp_irqs[idx].irqflag>>2) & 3) 848 switch ((mp_irqs[idx].irqflag >> 2) & 0x03) {
883 { 849 case 0:
884 case 0: /* conforms, ie. bus-type dependent */ 850 /* conforms to spec, ie. bus-type dependent trigger mode */
885 if (test_bit(bus, mp_bus_not_pci)) 851 if (test_bit(bus, mp_bus_not_pci))
886 trigger = default_ISA_trigger(idx); 852 trigger = default_ISA_trigger(idx);
887 else 853 else
888 trigger = default_PCI_trigger(idx); 854 trigger = default_PCI_trigger(idx);
889#ifdef CONFIG_EISA 855 /* Take EISA into account */
890 switch (mp_bus_id_to_type[bus]) { 856 return eisa_irq_trigger(idx, bus, trigger);
891 case MP_BUS_ISA: /* ISA pin */ 857 case 1:
892 { 858 return IOAPIC_EDGE;
893 /* set before the switch */ 859 case 2:
894 break; 860 pr_warn("IOAPIC: Invalid trigger mode 2 defaulting to level\n");
895 } 861 case 3:
896 case MP_BUS_EISA: /* EISA pin */ 862 default: /* Pointless default required due to do gcc stupidity */
897 { 863 return IOAPIC_LEVEL;
898 trigger = default_EISA_trigger(idx); 864 }
899 break; 865}
900 } 866
901 case MP_BUS_PCI: /* PCI pin */ 867void ioapic_set_alloc_attr(struct irq_alloc_info *info, int node,
902 { 868 int trigger, int polarity)
903 /* set before the switch */ 869{
904 break; 870 init_irq_alloc_info(info, NULL);
905 } 871 info->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
906 default: 872 info->ioapic_node = node;
907 { 873 info->ioapic_trigger = trigger;
908 pr_warn("broken BIOS!!\n"); 874 info->ioapic_polarity = polarity;
909 trigger = 1; 875 info->ioapic_valid = 1;
910 break; 876}
911 } 877
912 } 878#ifndef CONFIG_ACPI
879int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity);
913#endif 880#endif
914 break; 881
915 case 1: /* edge */ 882static void ioapic_copy_alloc_attr(struct irq_alloc_info *dst,
916 { 883 struct irq_alloc_info *src,
917 trigger = 0; 884 u32 gsi, int ioapic_idx, int pin)
918 break; 885{
919 } 886 int trigger, polarity;
920 case 2: /* reserved */ 887
921 { 888 copy_irq_alloc_info(dst, src);
922 pr_warn("broken BIOS!!\n"); 889 dst->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
923 trigger = 1; 890 dst->ioapic_id = mpc_ioapic_id(ioapic_idx);
924 break; 891 dst->ioapic_pin = pin;
925 } 892 dst->ioapic_valid = 1;
926 case 3: /* level */ 893 if (src && src->ioapic_valid) {
927 { 894 dst->ioapic_node = src->ioapic_node;
928 trigger = 1; 895 dst->ioapic_trigger = src->ioapic_trigger;
929 break; 896 dst->ioapic_polarity = src->ioapic_polarity;
930 } 897 } else {
931 default: /* invalid */ 898 dst->ioapic_node = NUMA_NO_NODE;
932 { 899 if (acpi_get_override_irq(gsi, &trigger, &polarity) >= 0) {
933 pr_warn("broken BIOS!!\n"); 900 dst->ioapic_trigger = trigger;
934 trigger = 0; 901 dst->ioapic_polarity = polarity;
935 break; 902 } else {
903 /*
904 * PCI interrupts are always active low level
905 * triggered.
906 */
907 dst->ioapic_trigger = IOAPIC_LEVEL;
908 dst->ioapic_polarity = IOAPIC_POL_LOW;
936 } 909 }
937 } 910 }
938 return trigger;
939} 911}
940 912
941static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int pin) 913static int ioapic_alloc_attr_node(struct irq_alloc_info *info)
914{
915 return (info && info->ioapic_valid) ? info->ioapic_node : NUMA_NO_NODE;
916}
917
918static void mp_register_handler(unsigned int irq, unsigned long trigger)
919{
920 irq_flow_handler_t hdl;
921 bool fasteoi;
922
923 if (trigger) {
924 irq_set_status_flags(irq, IRQ_LEVEL);
925 fasteoi = true;
926 } else {
927 irq_clear_status_flags(irq, IRQ_LEVEL);
928 fasteoi = false;
929 }
930
931 hdl = fasteoi ? handle_fasteoi_irq : handle_edge_irq;
932 __irq_set_handler(irq, hdl, 0, fasteoi ? "fasteoi" : "edge");
933}
934
935static bool mp_check_pin_attr(int irq, struct irq_alloc_info *info)
942{ 936{
937 struct mp_chip_data *data = irq_get_chip_data(irq);
938
939 /*
940 * setup_IO_APIC_irqs() programs all legacy IRQs with default trigger
941 * and polarity attirbutes. So allow the first user to reprogram the
942 * pin with real trigger and polarity attributes.
943 */
944 if (irq < nr_legacy_irqs() && data->count == 1) {
945 if (info->ioapic_trigger != data->trigger)
946 mp_register_handler(irq, data->trigger);
947 data->entry.trigger = data->trigger = info->ioapic_trigger;
948 data->entry.polarity = data->polarity = info->ioapic_polarity;
949 }
950
951 return data->trigger == info->ioapic_trigger &&
952 data->polarity == info->ioapic_polarity;
953}
954
955static int alloc_irq_from_domain(struct irq_domain *domain, int ioapic, u32 gsi,
956 struct irq_alloc_info *info)
957{
958 bool legacy = false;
943 int irq = -1; 959 int irq = -1;
944 int ioapic = (int)(long)domain->host_data;
945 int type = ioapics[ioapic].irqdomain_cfg.type; 960 int type = ioapics[ioapic].irqdomain_cfg.type;
946 961
947 switch (type) { 962 switch (type) {
948 case IOAPIC_DOMAIN_LEGACY: 963 case IOAPIC_DOMAIN_LEGACY:
949 /* 964 /*
950 * Dynamically allocate IRQ number for non-ISA IRQs in the first 16 965 * Dynamically allocate IRQ number for non-ISA IRQs in the first
951 * GSIs on some weird platforms. 966 * 16 GSIs on some weird platforms.
952 */ 967 */
953 if (gsi < nr_legacy_irqs()) 968 if (!ioapic_initialized || gsi >= nr_legacy_irqs())
954 irq = irq_create_mapping(domain, pin);
955 else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0)
956 irq = gsi; 969 irq = gsi;
970 legacy = mp_is_legacy_irq(irq);
957 break; 971 break;
958 case IOAPIC_DOMAIN_STRICT: 972 case IOAPIC_DOMAIN_STRICT:
959 if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) 973 irq = gsi;
960 irq = gsi;
961 break; 974 break;
962 case IOAPIC_DOMAIN_DYNAMIC: 975 case IOAPIC_DOMAIN_DYNAMIC:
963 irq = irq_create_mapping(domain, pin);
964 break; 976 break;
965 default: 977 default:
966 WARN(1, "ioapic: unknown irqdomain type %d\n", type); 978 WARN(1, "ioapic: unknown irqdomain type %d\n", type);
967 break; 979 return -1;
980 }
981
982 return __irq_domain_alloc_irqs(domain, irq, 1,
983 ioapic_alloc_attr_node(info),
984 info, legacy);
985}
986
987/*
988 * Need special handling for ISA IRQs because there may be multiple IOAPIC pins
989 * sharing the same ISA IRQ number and irqdomain only supports 1:1 mapping
990 * between IOAPIC pin and IRQ number. A typical IOAPIC has 24 pins, pin 0-15 are
991 * used for legacy IRQs and pin 16-23 are used for PCI IRQs (PIRQ A-H).
992 * When ACPI is disabled, only legacy IRQ numbers (IRQ0-15) are available, and
993 * some BIOSes may use MP Interrupt Source records to override IRQ numbers for
994 * PIRQs instead of reprogramming the interrupt routing logic. Thus there may be
995 * multiple pins sharing the same legacy IRQ number when ACPI is disabled.
996 */
997static int alloc_isa_irq_from_domain(struct irq_domain *domain,
998 int irq, int ioapic, int pin,
999 struct irq_alloc_info *info)
1000{
1001 struct mp_chip_data *data;
1002 struct irq_data *irq_data = irq_get_irq_data(irq);
1003 int node = ioapic_alloc_attr_node(info);
1004
1005 /*
1006 * Legacy ISA IRQ has already been allocated, just add pin to
1007 * the pin list assoicated with this IRQ and program the IOAPIC
1008 * entry. The IOAPIC entry
1009 */
1010 if (irq_data && irq_data->parent_data) {
1011 if (!mp_check_pin_attr(irq, info))
1012 return -EBUSY;
1013 if (__add_pin_to_irq_node(irq_data->chip_data, node, ioapic,
1014 info->ioapic_pin))
1015 return -ENOMEM;
1016 } else {
1017 irq = __irq_domain_alloc_irqs(domain, irq, 1, node, info, true);
1018 if (irq >= 0) {
1019 irq_data = irq_domain_get_irq_data(domain, irq);
1020 data = irq_data->chip_data;
1021 data->isa_irq = true;
1022 }
968 } 1023 }
969 1024
970 return irq > 0 ? irq : -1; 1025 return irq;
971} 1026}
972 1027
973static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin, 1028static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin,
974 unsigned int flags) 1029 unsigned int flags, struct irq_alloc_info *info)
975{ 1030{
976 int irq; 1031 int irq;
1032 bool legacy = false;
1033 struct irq_alloc_info tmp;
1034 struct mp_chip_data *data;
977 struct irq_domain *domain = mp_ioapic_irqdomain(ioapic); 1035 struct irq_domain *domain = mp_ioapic_irqdomain(ioapic);
978 struct mp_pin_info *info = mp_pin_info(ioapic, pin);
979 1036
980 if (!domain) 1037 if (!domain)
981 return -1; 1038 return -ENOSYS;
982 1039
983 mutex_lock(&ioapic_mutex);
984
985 /*
986 * Don't use irqdomain to manage ISA IRQs because there may be
987 * multiple IOAPIC pins sharing the same ISA IRQ number and
988 * irqdomain only supports 1:1 mapping between IOAPIC pin and
989 * IRQ number. A typical IOAPIC has 24 pins, pin 0-15 are used
990 * for legacy IRQs and pin 16-23 are used for PCI IRQs (PIRQ A-H).
991 * When ACPI is disabled, only legacy IRQ numbers (IRQ0-15) are
992 * available, and some BIOSes may use MP Interrupt Source records
993 * to override IRQ numbers for PIRQs instead of reprogramming
994 * the interrupt routing logic. Thus there may be multiple pins
995 * sharing the same legacy IRQ number when ACPI is disabled.
996 */
997 if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci)) { 1040 if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci)) {
998 irq = mp_irqs[idx].srcbusirq; 1041 irq = mp_irqs[idx].srcbusirq;
999 if (flags & IOAPIC_MAP_ALLOC) { 1042 legacy = mp_is_legacy_irq(irq);
1000 if (info->count == 0 && 1043 }
1001 mp_irqdomain_map(domain, irq, pin) != 0)
1002 irq = -1;
1003 1044
1004 /* special handling for timer IRQ0 */ 1045 mutex_lock(&ioapic_mutex);
1046 if (!(flags & IOAPIC_MAP_ALLOC)) {
1047 if (!legacy) {
1048 irq = irq_find_mapping(domain, pin);
1005 if (irq == 0) 1049 if (irq == 0)
1006 info->count++; 1050 irq = -ENOENT;
1007 } 1051 }
1008 } else { 1052 } else {
1009 irq = irq_find_mapping(domain, pin); 1053 ioapic_copy_alloc_attr(&tmp, info, gsi, ioapic, pin);
1010 if (irq <= 0 && (flags & IOAPIC_MAP_ALLOC)) 1054 if (legacy)
1011 irq = alloc_irq_from_domain(domain, gsi, pin); 1055 irq = alloc_isa_irq_from_domain(domain, irq,
1012 } 1056 ioapic, pin, &tmp);
1013 1057 else if ((irq = irq_find_mapping(domain, pin)) == 0)
1014 if (flags & IOAPIC_MAP_ALLOC) { 1058 irq = alloc_irq_from_domain(domain, ioapic, gsi, &tmp);
1015 /* special handling for legacy IRQs */ 1059 else if (!mp_check_pin_attr(irq, &tmp))
1016 if (irq < nr_legacy_irqs() && info->count == 1 && 1060 irq = -EBUSY;
1017 mp_irqdomain_map(domain, irq, pin) != 0) 1061 if (irq >= 0) {
1018 irq = -1; 1062 data = irq_get_chip_data(irq);
1019 1063 data->count++;
1020 if (irq > 0) 1064 }
1021 info->count++;
1022 else if (info->count == 0)
1023 info->set = 0;
1024 } 1065 }
1025
1026 mutex_unlock(&ioapic_mutex); 1066 mutex_unlock(&ioapic_mutex);
1027 1067
1028 return irq > 0 ? irq : -1; 1068 return irq;
1029} 1069}
1030 1070
1031static int pin_2_irq(int idx, int ioapic, int pin, unsigned int flags) 1071static int pin_2_irq(int idx, int ioapic, int pin, unsigned int flags)
@@ -1058,10 +1098,10 @@ static int pin_2_irq(int idx, int ioapic, int pin, unsigned int flags)
1058 } 1098 }
1059#endif 1099#endif
1060 1100
1061 return mp_map_pin_to_irq(gsi, idx, ioapic, pin, flags); 1101 return mp_map_pin_to_irq(gsi, idx, ioapic, pin, flags, NULL);
1062} 1102}
1063 1103
1064int mp_map_gsi_to_irq(u32 gsi, unsigned int flags) 1104int mp_map_gsi_to_irq(u32 gsi, unsigned int flags, struct irq_alloc_info *info)
1065{ 1105{
1066 int ioapic, pin, idx; 1106 int ioapic, pin, idx;
1067 1107
@@ -1074,31 +1114,24 @@ int mp_map_gsi_to_irq(u32 gsi, unsigned int flags)
1074 if ((flags & IOAPIC_MAP_CHECK) && idx < 0) 1114 if ((flags & IOAPIC_MAP_CHECK) && idx < 0)
1075 return -1; 1115 return -1;
1076 1116
1077 return mp_map_pin_to_irq(gsi, idx, ioapic, pin, flags); 1117 return mp_map_pin_to_irq(gsi, idx, ioapic, pin, flags, info);
1078} 1118}
1079 1119
1080void mp_unmap_irq(int irq) 1120void mp_unmap_irq(int irq)
1081{ 1121{
1082 struct irq_data *data = irq_get_irq_data(irq); 1122 struct irq_data *irq_data = irq_get_irq_data(irq);
1083 struct mp_pin_info *info; 1123 struct mp_chip_data *data;
1084 int ioapic, pin;
1085 1124
1086 if (!data || !data->domain) 1125 if (!irq_data || !irq_data->domain)
1087 return; 1126 return;
1088 1127
1089 ioapic = (int)(long)data->domain->host_data; 1128 data = irq_data->chip_data;
1090 pin = (int)data->hwirq; 1129 if (!data || data->isa_irq)
1091 info = mp_pin_info(ioapic, pin); 1130 return;
1092 1131
1093 mutex_lock(&ioapic_mutex); 1132 mutex_lock(&ioapic_mutex);
1094 if (--info->count == 0) { 1133 if (--data->count == 0)
1095 info->set = 0; 1134 irq_domain_free_irqs(irq, 1);
1096 if (irq < nr_legacy_irqs() &&
1097 ioapics[ioapic].irqdomain_cfg.type == IOAPIC_DOMAIN_LEGACY)
1098 mp_irqdomain_unmap(data->domain, irq);
1099 else
1100 irq_dispose_mapping(irq);
1101 }
1102 mutex_unlock(&ioapic_mutex); 1135 mutex_unlock(&ioapic_mutex);
1103} 1136}
1104 1137
@@ -1165,7 +1198,7 @@ out:
1165} 1198}
1166EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector); 1199EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
1167 1200
1168static struct irq_chip ioapic_chip; 1201static struct irq_chip ioapic_chip, ioapic_ir_chip;
1169 1202
1170#ifdef CONFIG_X86_32 1203#ifdef CONFIG_X86_32
1171static inline int IO_APIC_irq_trigger(int irq) 1204static inline int IO_APIC_irq_trigger(int irq)
@@ -1189,96 +1222,6 @@ static inline int IO_APIC_irq_trigger(int irq)
1189} 1222}
1190#endif 1223#endif
1191 1224
1192static void ioapic_register_intr(unsigned int irq, struct irq_cfg *cfg,
1193 unsigned long trigger)
1194{
1195 struct irq_chip *chip = &ioapic_chip;
1196 irq_flow_handler_t hdl;
1197 bool fasteoi;
1198
1199 if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
1200 trigger == IOAPIC_LEVEL) {
1201 irq_set_status_flags(irq, IRQ_LEVEL);
1202 fasteoi = true;
1203 } else {
1204 irq_clear_status_flags(irq, IRQ_LEVEL);
1205 fasteoi = false;
1206 }
1207
1208 if (setup_remapped_irq(irq, cfg, chip))
1209 fasteoi = trigger != 0;
1210
1211 hdl = fasteoi ? handle_fasteoi_irq : handle_edge_irq;
1212 irq_set_chip_and_handler_name(irq, chip, hdl,
1213 fasteoi ? "fasteoi" : "edge");
1214}
1215
1216int native_setup_ioapic_entry(int irq, struct IO_APIC_route_entry *entry,
1217 unsigned int destination, int vector,
1218 struct io_apic_irq_attr *attr)
1219{
1220 memset(entry, 0, sizeof(*entry));
1221
1222 entry->delivery_mode = apic->irq_delivery_mode;
1223 entry->dest_mode = apic->irq_dest_mode;
1224 entry->dest = destination;
1225 entry->vector = vector;
1226 entry->mask = 0; /* enable IRQ */
1227 entry->trigger = attr->trigger;
1228 entry->polarity = attr->polarity;
1229
1230 /*
1231 * Mask level triggered irqs.
1232 * Use IRQ_DELAYED_DISABLE for edge triggered irqs.
1233 */
1234 if (attr->trigger)
1235 entry->mask = 1;
1236
1237 return 0;
1238}
1239
1240static void setup_ioapic_irq(unsigned int irq, struct irq_cfg *cfg,
1241 struct io_apic_irq_attr *attr)
1242{
1243 struct IO_APIC_route_entry entry;
1244 unsigned int dest;
1245
1246 if (!IO_APIC_IRQ(irq))
1247 return;
1248
1249 if (assign_irq_vector(irq, cfg, apic->target_cpus()))
1250 return;
1251
1252 if (apic->cpu_mask_to_apicid_and(cfg->domain, apic->target_cpus(),
1253 &dest)) {
1254 pr_warn("Failed to obtain apicid for ioapic %d, pin %d\n",
1255 mpc_ioapic_id(attr->ioapic), attr->ioapic_pin);
1256 clear_irq_vector(irq, cfg);
1257
1258 return;
1259 }
1260
1261 apic_printk(APIC_VERBOSE,KERN_DEBUG
1262 "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> "
1263 "IRQ %d Mode:%i Active:%i Dest:%d)\n",
1264 attr->ioapic, mpc_ioapic_id(attr->ioapic), attr->ioapic_pin,
1265 cfg->vector, irq, attr->trigger, attr->polarity, dest);
1266
1267 if (x86_io_apic_ops.setup_entry(irq, &entry, dest, cfg->vector, attr)) {
1268 pr_warn("Failed to setup ioapic entry for ioapic %d, pin %d\n",
1269 mpc_ioapic_id(attr->ioapic), attr->ioapic_pin);
1270 clear_irq_vector(irq, cfg);
1271
1272 return;
1273 }
1274
1275 ioapic_register_intr(irq, cfg, attr->trigger);
1276 if (irq < nr_legacy_irqs())
1277 legacy_pic->mask(irq);
1278
1279 ioapic_write_entry(attr->ioapic, attr->ioapic_pin, entry);
1280}
1281
1282static void __init setup_IO_APIC_irqs(void) 1225static void __init setup_IO_APIC_irqs(void)
1283{ 1226{
1284 unsigned int ioapic, pin; 1227 unsigned int ioapic, pin;
@@ -1298,106 +1241,41 @@ static void __init setup_IO_APIC_irqs(void)
1298 } 1241 }
1299} 1242}
1300 1243
1301/* 1244void ioapic_zap_locks(void)
1302 * Set up the timer pin, possibly with the 8259A-master behind.
1303 */
1304static void __init setup_timer_IRQ0_pin(unsigned int ioapic_idx,
1305 unsigned int pin, int vector)
1306{
1307 struct IO_APIC_route_entry entry;
1308 unsigned int dest;
1309
1310 memset(&entry, 0, sizeof(entry));
1311
1312 /*
1313 * We use logical delivery to get the timer IRQ
1314 * to the first CPU.
1315 */
1316 if (unlikely(apic->cpu_mask_to_apicid_and(apic->target_cpus(),
1317 apic->target_cpus(), &dest)))
1318 dest = BAD_APICID;
1319
1320 entry.dest_mode = apic->irq_dest_mode;
1321 entry.mask = 0; /* don't mask IRQ for edge */
1322 entry.dest = dest;
1323 entry.delivery_mode = apic->irq_delivery_mode;
1324 entry.polarity = 0;
1325 entry.trigger = 0;
1326 entry.vector = vector;
1327
1328 /*
1329 * The timer IRQ doesn't have to know that behind the
1330 * scene we may have a 8259A-master in AEOI mode ...
1331 */
1332 irq_set_chip_and_handler_name(0, &ioapic_chip, handle_edge_irq,
1333 "edge");
1334
1335 /*
1336 * Add it to the IO-APIC irq-routing table:
1337 */
1338 ioapic_write_entry(ioapic_idx, pin, entry);
1339}
1340
1341void native_io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
1342{ 1245{
1343 int i; 1246 raw_spin_lock_init(&ioapic_lock);
1344
1345 pr_debug(" NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:\n");
1346
1347 for (i = 0; i <= nr_entries; i++) {
1348 struct IO_APIC_route_entry entry;
1349
1350 entry = ioapic_read_entry(apic, i);
1351
1352 pr_debug(" %02x %02X ", i, entry.dest);
1353 pr_cont("%1d %1d %1d %1d %1d "
1354 "%1d %1d %02X\n",
1355 entry.mask,
1356 entry.trigger,
1357 entry.irr,
1358 entry.polarity,
1359 entry.delivery_status,
1360 entry.dest_mode,
1361 entry.delivery_mode,
1362 entry.vector);
1363 }
1364} 1247}
1365 1248
1366void intel_ir_io_apic_print_entries(unsigned int apic, 1249static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries)
1367 unsigned int nr_entries)
1368{ 1250{
1369 int i; 1251 int i;
1252 char buf[256];
1253 struct IO_APIC_route_entry entry;
1254 struct IR_IO_APIC_route_entry *ir_entry = (void *)&entry;
1370 1255
1371 pr_debug(" NR Indx Fmt Mask Trig IRR Pol Stat Indx2 Zero Vect:\n"); 1256 printk(KERN_DEBUG "IOAPIC %d:\n", apic);
1372
1373 for (i = 0; i <= nr_entries; i++) { 1257 for (i = 0; i <= nr_entries; i++) {
1374 struct IR_IO_APIC_route_entry *ir_entry;
1375 struct IO_APIC_route_entry entry;
1376
1377 entry = ioapic_read_entry(apic, i); 1258 entry = ioapic_read_entry(apic, i);
1378 1259 snprintf(buf, sizeof(buf),
1379 ir_entry = (struct IR_IO_APIC_route_entry *)&entry; 1260 " pin%02x, %s, %s, %s, V(%02X), IRR(%1d), S(%1d)",
1380 1261 i,
1381 pr_debug(" %02x %04X ", i, ir_entry->index); 1262 entry.mask == IOAPIC_MASKED ? "disabled" : "enabled ",
1382 pr_cont("%1d %1d %1d %1d %1d " 1263 entry.trigger == IOAPIC_LEVEL ? "level" : "edge ",
1383 "%1d %1d %X %02X\n", 1264 entry.polarity == IOAPIC_POL_LOW ? "low " : "high",
1384 ir_entry->format, 1265 entry.vector, entry.irr, entry.delivery_status);
1385 ir_entry->mask, 1266 if (ir_entry->format)
1386 ir_entry->trigger, 1267 printk(KERN_DEBUG "%s, remapped, I(%04X), Z(%X)\n",
1387 ir_entry->irr, 1268 buf, (ir_entry->index << 15) | ir_entry->index,
1388 ir_entry->polarity, 1269 ir_entry->zero);
1389 ir_entry->delivery_status, 1270 else
1390 ir_entry->index2, 1271 printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n",
1391 ir_entry->zero, 1272 buf,
1392 ir_entry->vector); 1273 entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ?
1274 "logical " : "physical",
1275 entry.dest, entry.delivery_mode);
1393 } 1276 }
1394} 1277}
1395 1278
1396void ioapic_zap_locks(void)
1397{
1398 raw_spin_lock_init(&ioapic_lock);
1399}
1400
1401static void __init print_IO_APIC(int ioapic_idx) 1279static void __init print_IO_APIC(int ioapic_idx)
1402{ 1280{
1403 union IO_APIC_reg_00 reg_00; 1281 union IO_APIC_reg_00 reg_00;
@@ -1451,16 +1329,13 @@ static void __init print_IO_APIC(int ioapic_idx)
1451 } 1329 }
1452 1330
1453 printk(KERN_DEBUG ".... IRQ redirection table:\n"); 1331 printk(KERN_DEBUG ".... IRQ redirection table:\n");
1454 1332 io_apic_print_entries(ioapic_idx, reg_01.bits.entries);
1455 x86_io_apic_ops.print_entries(ioapic_idx, reg_01.bits.entries);
1456} 1333}
1457 1334
1458void __init print_IO_APICs(void) 1335void __init print_IO_APICs(void)
1459{ 1336{
1460 int ioapic_idx; 1337 int ioapic_idx;
1461 struct irq_cfg *cfg;
1462 unsigned int irq; 1338 unsigned int irq;
1463 struct irq_chip *chip;
1464 1339
1465 printk(KERN_DEBUG "number of MP IRQ sources: %d.\n", mp_irq_entries); 1340 printk(KERN_DEBUG "number of MP IRQ sources: %d.\n", mp_irq_entries);
1466 for_each_ioapic(ioapic_idx) 1341 for_each_ioapic(ioapic_idx)
@@ -1480,18 +1355,20 @@ void __init print_IO_APICs(void)
1480 printk(KERN_DEBUG "IRQ to pin mappings:\n"); 1355 printk(KERN_DEBUG "IRQ to pin mappings:\n");
1481 for_each_active_irq(irq) { 1356 for_each_active_irq(irq) {
1482 struct irq_pin_list *entry; 1357 struct irq_pin_list *entry;
1358 struct irq_chip *chip;
1359 struct mp_chip_data *data;
1483 1360
1484 chip = irq_get_chip(irq); 1361 chip = irq_get_chip(irq);
1485 if (chip != &ioapic_chip) 1362 if (chip != &ioapic_chip && chip != &ioapic_ir_chip)
1486 continue; 1363 continue;
1487 1364 data = irq_get_chip_data(irq);
1488 cfg = irq_cfg(irq); 1365 if (!data)
1489 if (!cfg)
1490 continue; 1366 continue;
1491 if (list_empty(&cfg->irq_2_pin)) 1367 if (list_empty(&data->irq_2_pin))
1492 continue; 1368 continue;
1369
1493 printk(KERN_DEBUG "IRQ%d ", irq); 1370 printk(KERN_DEBUG "IRQ%d ", irq);
1494 for_each_irq_pin(entry, cfg->irq_2_pin) 1371 for_each_irq_pin(entry, data->irq_2_pin)
1495 pr_cont("-> %d:%d", entry->apic, entry->pin); 1372 pr_cont("-> %d:%d", entry->apic, entry->pin);
1496 pr_cont("\n"); 1373 pr_cont("\n");
1497 } 1374 }
@@ -1564,15 +1441,12 @@ void native_disable_io_apic(void)
1564 struct IO_APIC_route_entry entry; 1441 struct IO_APIC_route_entry entry;
1565 1442
1566 memset(&entry, 0, sizeof(entry)); 1443 memset(&entry, 0, sizeof(entry));
1567 entry.mask = 0; /* Enabled */ 1444 entry.mask = IOAPIC_UNMASKED;
1568 entry.trigger = 0; /* Edge */ 1445 entry.trigger = IOAPIC_EDGE;
1569 entry.irr = 0; 1446 entry.polarity = IOAPIC_POL_HIGH;
1570 entry.polarity = 0; /* High */ 1447 entry.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
1571 entry.delivery_status = 0; 1448 entry.delivery_mode = dest_ExtINT;
1572 entry.dest_mode = 0; /* Physical */ 1449 entry.dest = read_apic_id();
1573 entry.delivery_mode = dest_ExtINT; /* ExtInt */
1574 entry.vector = 0;
1575 entry.dest = read_apic_id();
1576 1450
1577 /* 1451 /*
1578 * Add it to the IO-APIC irq-routing table: 1452 * Add it to the IO-APIC irq-routing table:
@@ -1582,7 +1456,6 @@ void native_disable_io_apic(void)
1582 1456
1583 if (cpu_has_apic || apic_from_smp_config()) 1457 if (cpu_has_apic || apic_from_smp_config())
1584 disconnect_bsp_APIC(ioapic_i8259.pin != -1); 1458 disconnect_bsp_APIC(ioapic_i8259.pin != -1);
1585
1586} 1459}
1587 1460
1588/* 1461/*
@@ -1792,7 +1665,6 @@ static int __init timer_irq_works(void)
1792 * This is not complete - we should be able to fake 1665 * This is not complete - we should be able to fake
1793 * an edge even if it isn't on the 8259A... 1666 * an edge even if it isn't on the 8259A...
1794 */ 1667 */
1795
1796static unsigned int startup_ioapic_irq(struct irq_data *data) 1668static unsigned int startup_ioapic_irq(struct irq_data *data)
1797{ 1669{
1798 int was_pending = 0, irq = data->irq; 1670 int was_pending = 0, irq = data->irq;
@@ -1804,74 +1676,22 @@ static unsigned int startup_ioapic_irq(struct irq_data *data)
1804 if (legacy_pic->irq_pending(irq)) 1676 if (legacy_pic->irq_pending(irq))
1805 was_pending = 1; 1677 was_pending = 1;
1806 } 1678 }
1807 __unmask_ioapic(irqd_cfg(data)); 1679 __unmask_ioapic(data->chip_data);
1808 raw_spin_unlock_irqrestore(&ioapic_lock, flags); 1680 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
1809 1681
1810 return was_pending; 1682 return was_pending;
1811} 1683}
1812 1684
1813/*
1814 * Level and edge triggered IO-APIC interrupts need different handling,
1815 * so we use two separate IRQ descriptors. Edge triggered IRQs can be
1816 * handled with the level-triggered descriptor, but that one has slightly
1817 * more overhead. Level-triggered interrupts cannot be handled with the
1818 * edge-triggered handler, without risking IRQ storms and other ugly
1819 * races.
1820 */
1821
1822static void __target_IO_APIC_irq(unsigned int irq, unsigned int dest, struct irq_cfg *cfg)
1823{
1824 int apic, pin;
1825 struct irq_pin_list *entry;
1826 u8 vector = cfg->vector;
1827
1828 for_each_irq_pin(entry, cfg->irq_2_pin) {
1829 unsigned int reg;
1830
1831 apic = entry->apic;
1832 pin = entry->pin;
1833
1834 io_apic_write(apic, 0x11 + pin*2, dest);
1835 reg = io_apic_read(apic, 0x10 + pin*2);
1836 reg &= ~IO_APIC_REDIR_VECTOR_MASK;
1837 reg |= vector;
1838 io_apic_modify(apic, 0x10 + pin*2, reg);
1839 }
1840}
1841
1842int native_ioapic_set_affinity(struct irq_data *data,
1843 const struct cpumask *mask,
1844 bool force)
1845{
1846 unsigned int dest, irq = data->irq;
1847 unsigned long flags;
1848 int ret;
1849
1850 if (!config_enabled(CONFIG_SMP))
1851 return -EPERM;
1852
1853 raw_spin_lock_irqsave(&ioapic_lock, flags);
1854 ret = apic_set_affinity(data, mask, &dest);
1855 if (!ret) {
1856 /* Only the high 8 bits are valid. */
1857 dest = SET_APIC_LOGICAL_ID(dest);
1858 __target_IO_APIC_irq(irq, dest, irqd_cfg(data));
1859 ret = IRQ_SET_MASK_OK_NOCOPY;
1860 }
1861 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
1862 return ret;
1863}
1864
1865atomic_t irq_mis_count; 1685atomic_t irq_mis_count;
1866 1686
1867#ifdef CONFIG_GENERIC_PENDING_IRQ 1687#ifdef CONFIG_GENERIC_PENDING_IRQ
1868static bool io_apic_level_ack_pending(struct irq_cfg *cfg) 1688static bool io_apic_level_ack_pending(struct mp_chip_data *data)
1869{ 1689{
1870 struct irq_pin_list *entry; 1690 struct irq_pin_list *entry;
1871 unsigned long flags; 1691 unsigned long flags;
1872 1692
1873 raw_spin_lock_irqsave(&ioapic_lock, flags); 1693 raw_spin_lock_irqsave(&ioapic_lock, flags);
1874 for_each_irq_pin(entry, cfg->irq_2_pin) { 1694 for_each_irq_pin(entry, data->irq_2_pin) {
1875 unsigned int reg; 1695 unsigned int reg;
1876 int pin; 1696 int pin;
1877 1697
@@ -1888,18 +1708,17 @@ static bool io_apic_level_ack_pending(struct irq_cfg *cfg)
1888 return false; 1708 return false;
1889} 1709}
1890 1710
1891static inline bool ioapic_irqd_mask(struct irq_data *data, struct irq_cfg *cfg) 1711static inline bool ioapic_irqd_mask(struct irq_data *data)
1892{ 1712{
1893 /* If we are moving the irq we need to mask it */ 1713 /* If we are moving the irq we need to mask it */
1894 if (unlikely(irqd_is_setaffinity_pending(data))) { 1714 if (unlikely(irqd_is_setaffinity_pending(data))) {
1895 mask_ioapic(cfg); 1715 mask_ioapic_irq(data);
1896 return true; 1716 return true;
1897 } 1717 }
1898 return false; 1718 return false;
1899} 1719}
1900 1720
1901static inline void ioapic_irqd_unmask(struct irq_data *data, 1721static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked)
1902 struct irq_cfg *cfg, bool masked)
1903{ 1722{
1904 if (unlikely(masked)) { 1723 if (unlikely(masked)) {
1905 /* Only migrate the irq if the ack has been received. 1724 /* Only migrate the irq if the ack has been received.
@@ -1928,31 +1747,30 @@ static inline void ioapic_irqd_unmask(struct irq_data *data,
1928 * accurate and is causing problems then it is a hardware bug 1747 * accurate and is causing problems then it is a hardware bug
1929 * and you can go talk to the chipset vendor about it. 1748 * and you can go talk to the chipset vendor about it.
1930 */ 1749 */
1931 if (!io_apic_level_ack_pending(cfg)) 1750 if (!io_apic_level_ack_pending(data->chip_data))
1932 irq_move_masked_irq(data); 1751 irq_move_masked_irq(data);
1933 unmask_ioapic(cfg); 1752 unmask_ioapic_irq(data);
1934 } 1753 }
1935} 1754}
1936#else 1755#else
1937static inline bool ioapic_irqd_mask(struct irq_data *data, struct irq_cfg *cfg) 1756static inline bool ioapic_irqd_mask(struct irq_data *data)
1938{ 1757{
1939 return false; 1758 return false;
1940} 1759}
1941static inline void ioapic_irqd_unmask(struct irq_data *data, 1760static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked)
1942 struct irq_cfg *cfg, bool masked)
1943{ 1761{
1944} 1762}
1945#endif 1763#endif
1946 1764
1947static void ack_ioapic_level(struct irq_data *data) 1765static void ioapic_ack_level(struct irq_data *irq_data)
1948{ 1766{
1949 struct irq_cfg *cfg = irqd_cfg(data); 1767 struct irq_cfg *cfg = irqd_cfg(irq_data);
1950 int i, irq = data->irq;
1951 unsigned long v; 1768 unsigned long v;
1952 bool masked; 1769 bool masked;
1770 int i;
1953 1771
1954 irq_complete_move(cfg); 1772 irq_complete_move(cfg);
1955 masked = ioapic_irqd_mask(data, cfg); 1773 masked = ioapic_irqd_mask(irq_data);
1956 1774
1957 /* 1775 /*
1958 * It appears there is an erratum which affects at least version 0x11 1776 * It appears there is an erratum which affects at least version 0x11
@@ -2004,11 +1822,49 @@ static void ack_ioapic_level(struct irq_data *data)
2004 */ 1822 */
2005 if (!(v & (1 << (i & 0x1f)))) { 1823 if (!(v & (1 << (i & 0x1f)))) {
2006 atomic_inc(&irq_mis_count); 1824 atomic_inc(&irq_mis_count);
1825 eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
1826 }
1827
1828 ioapic_irqd_unmask(irq_data, masked);
1829}
1830
1831static void ioapic_ir_ack_level(struct irq_data *irq_data)
1832{
1833 struct mp_chip_data *data = irq_data->chip_data;
1834
1835 /*
1836 * Intr-remapping uses pin number as the virtual vector
1837 * in the RTE. Actual vector is programmed in
1838 * intr-remapping table entry. Hence for the io-apic
1839 * EOI we use the pin number.
1840 */
1841 ack_APIC_irq();
1842 eoi_ioapic_pin(data->entry.vector, data);
1843}
2007 1844
2008 eoi_ioapic_irq(irq, cfg); 1845static int ioapic_set_affinity(struct irq_data *irq_data,
1846 const struct cpumask *mask, bool force)
1847{
1848 struct irq_data *parent = irq_data->parent_data;
1849 struct mp_chip_data *data = irq_data->chip_data;
1850 struct irq_pin_list *entry;
1851 struct irq_cfg *cfg;
1852 unsigned long flags;
1853 int ret;
1854
1855 ret = parent->chip->irq_set_affinity(parent, mask, force);
1856 raw_spin_lock_irqsave(&ioapic_lock, flags);
1857 if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
1858 cfg = irqd_cfg(irq_data);
1859 data->entry.dest = cfg->dest_apicid;
1860 data->entry.vector = cfg->vector;
1861 for_each_irq_pin(entry, data->irq_2_pin)
1862 __ioapic_write_entry(entry->apic, entry->pin,
1863 data->entry);
2009 } 1864 }
1865 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
2010 1866
2011 ioapic_irqd_unmask(data, cfg, masked); 1867 return ret;
2012} 1868}
2013 1869
2014static struct irq_chip ioapic_chip __read_mostly = { 1870static struct irq_chip ioapic_chip __read_mostly = {
@@ -2016,10 +1872,20 @@ static struct irq_chip ioapic_chip __read_mostly = {
2016 .irq_startup = startup_ioapic_irq, 1872 .irq_startup = startup_ioapic_irq,
2017 .irq_mask = mask_ioapic_irq, 1873 .irq_mask = mask_ioapic_irq,
2018 .irq_unmask = unmask_ioapic_irq, 1874 .irq_unmask = unmask_ioapic_irq,
2019 .irq_ack = apic_ack_edge, 1875 .irq_ack = irq_chip_ack_parent,
2020 .irq_eoi = ack_ioapic_level, 1876 .irq_eoi = ioapic_ack_level,
2021 .irq_set_affinity = native_ioapic_set_affinity, 1877 .irq_set_affinity = ioapic_set_affinity,
2022 .irq_retrigger = apic_retrigger_irq, 1878 .flags = IRQCHIP_SKIP_SET_WAKE,
1879};
1880
1881static struct irq_chip ioapic_ir_chip __read_mostly = {
1882 .name = "IR-IO-APIC",
1883 .irq_startup = startup_ioapic_irq,
1884 .irq_mask = mask_ioapic_irq,
1885 .irq_unmask = unmask_ioapic_irq,
1886 .irq_ack = irq_chip_ack_parent,
1887 .irq_eoi = ioapic_ir_ack_level,
1888 .irq_set_affinity = ioapic_set_affinity,
2023 .flags = IRQCHIP_SKIP_SET_WAKE, 1889 .flags = IRQCHIP_SKIP_SET_WAKE,
2024}; 1890};
2025 1891
@@ -2113,12 +1979,12 @@ static inline void __init unlock_ExtINT_logic(void)
2113 1979
2114 memset(&entry1, 0, sizeof(entry1)); 1980 memset(&entry1, 0, sizeof(entry1));
2115 1981
2116 entry1.dest_mode = 0; /* physical delivery */ 1982 entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL;
2117 entry1.mask = 0; /* unmask IRQ now */ 1983 entry1.mask = IOAPIC_UNMASKED;
2118 entry1.dest = hard_smp_processor_id(); 1984 entry1.dest = hard_smp_processor_id();
2119 entry1.delivery_mode = dest_ExtINT; 1985 entry1.delivery_mode = dest_ExtINT;
2120 entry1.polarity = entry0.polarity; 1986 entry1.polarity = entry0.polarity;
2121 entry1.trigger = 0; 1987 entry1.trigger = IOAPIC_EDGE;
2122 entry1.vector = 0; 1988 entry1.vector = 0;
2123 1989
2124 ioapic_write_entry(apic, pin, entry1); 1990 ioapic_write_entry(apic, pin, entry1);
@@ -2152,6 +2018,25 @@ static int __init disable_timer_pin_setup(char *arg)
2152} 2018}
2153early_param("disable_timer_pin_1", disable_timer_pin_setup); 2019early_param("disable_timer_pin_1", disable_timer_pin_setup);
2154 2020
2021static int mp_alloc_timer_irq(int ioapic, int pin)
2022{
2023 int irq = -1;
2024 struct irq_domain *domain = mp_ioapic_irqdomain(ioapic);
2025
2026 if (domain) {
2027 struct irq_alloc_info info;
2028
2029 ioapic_set_alloc_attr(&info, NUMA_NO_NODE, 0, 0);
2030 info.ioapic_id = mpc_ioapic_id(ioapic);
2031 info.ioapic_pin = pin;
2032 mutex_lock(&ioapic_mutex);
2033 irq = alloc_isa_irq_from_domain(domain, 0, ioapic, pin, &info);
2034 mutex_unlock(&ioapic_mutex);
2035 }
2036
2037 return irq;
2038}
2039
2155/* 2040/*
2156 * This code may look a bit paranoid, but it's supposed to cooperate with 2041 * This code may look a bit paranoid, but it's supposed to cooperate with
2157 * a wide range of boards and BIOS bugs. Fortunately only the timer IRQ 2042 * a wide range of boards and BIOS bugs. Fortunately only the timer IRQ
@@ -2162,7 +2047,9 @@ early_param("disable_timer_pin_1", disable_timer_pin_setup);
2162 */ 2047 */
2163static inline void __init check_timer(void) 2048static inline void __init check_timer(void)
2164{ 2049{
2165 struct irq_cfg *cfg = irq_cfg(0); 2050 struct irq_data *irq_data = irq_get_irq_data(0);
2051 struct mp_chip_data *data = irq_data->chip_data;
2052 struct irq_cfg *cfg = irqd_cfg(irq_data);
2166 int node = cpu_to_node(0); 2053 int node = cpu_to_node(0);
2167 int apic1, pin1, apic2, pin2; 2054 int apic1, pin1, apic2, pin2;
2168 unsigned long flags; 2055 unsigned long flags;
@@ -2174,7 +2061,6 @@ static inline void __init check_timer(void)
2174 * get/set the timer IRQ vector: 2061 * get/set the timer IRQ vector:
2175 */ 2062 */
2176 legacy_pic->mask(0); 2063 legacy_pic->mask(0);
2177 assign_irq_vector(0, cfg, apic->target_cpus());
2178 2064
2179 /* 2065 /*
2180 * As IRQ0 is to be enabled in the 8259A, the virtual 2066 * As IRQ0 is to be enabled in the 8259A, the virtual
@@ -2215,23 +2101,21 @@ static inline void __init check_timer(void)
2215 } 2101 }
2216 2102
2217 if (pin1 != -1) { 2103 if (pin1 != -1) {
2218 /* 2104 /* Ok, does IRQ0 through the IOAPIC work? */
2219 * Ok, does IRQ0 through the IOAPIC work?
2220 */
2221 if (no_pin1) { 2105 if (no_pin1) {
2222 add_pin_to_irq_node(cfg, node, apic1, pin1); 2106 mp_alloc_timer_irq(apic1, pin1);
2223 setup_timer_IRQ0_pin(apic1, pin1, cfg->vector);
2224 } else { 2107 } else {
2225 /* for edge trigger, setup_ioapic_irq already 2108 /*
2226 * leave it unmasked. 2109 * for edge trigger, it's already unmasked,
2227 * so only need to unmask if it is level-trigger 2110 * so only need to unmask if it is level-trigger
2228 * do we really have level trigger timer? 2111 * do we really have level trigger timer?
2229 */ 2112 */
2230 int idx; 2113 int idx;
2231 idx = find_irq_entry(apic1, pin1, mp_INT); 2114 idx = find_irq_entry(apic1, pin1, mp_INT);
2232 if (idx != -1 && irq_trigger(idx)) 2115 if (idx != -1 && irq_trigger(idx))
2233 unmask_ioapic(cfg); 2116 unmask_ioapic_irq(irq_get_chip_data(0));
2234 } 2117 }
2118 irq_domain_activate_irq(irq_data);
2235 if (timer_irq_works()) { 2119 if (timer_irq_works()) {
2236 if (disable_timer_pin_1 > 0) 2120 if (disable_timer_pin_1 > 0)
2237 clear_IO_APIC_pin(0, pin1); 2121 clear_IO_APIC_pin(0, pin1);
@@ -2251,8 +2135,8 @@ static inline void __init check_timer(void)
2251 /* 2135 /*
2252 * legacy devices should be connected to IO APIC #0 2136 * legacy devices should be connected to IO APIC #0
2253 */ 2137 */
2254 replace_pin_at_irq_node(cfg, node, apic1, pin1, apic2, pin2); 2138 replace_pin_at_irq_node(data, node, apic1, pin1, apic2, pin2);
2255 setup_timer_IRQ0_pin(apic2, pin2, cfg->vector); 2139 irq_domain_activate_irq(irq_data);
2256 legacy_pic->unmask(0); 2140 legacy_pic->unmask(0);
2257 if (timer_irq_works()) { 2141 if (timer_irq_works()) {
2258 apic_printk(APIC_QUIET, KERN_INFO "....... works.\n"); 2142 apic_printk(APIC_QUIET, KERN_INFO "....... works.\n");
@@ -2329,36 +2213,35 @@ out:
2329 2213
2330static int mp_irqdomain_create(int ioapic) 2214static int mp_irqdomain_create(int ioapic)
2331{ 2215{
2332 size_t size; 2216 struct irq_alloc_info info;
2217 struct irq_domain *parent;
2333 int hwirqs = mp_ioapic_pin_count(ioapic); 2218 int hwirqs = mp_ioapic_pin_count(ioapic);
2334 struct ioapic *ip = &ioapics[ioapic]; 2219 struct ioapic *ip = &ioapics[ioapic];
2335 struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg; 2220 struct ioapic_domain_cfg *cfg = &ip->irqdomain_cfg;
2336 struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic); 2221 struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
2337 2222
2338 size = sizeof(struct mp_pin_info) * mp_ioapic_pin_count(ioapic);
2339 ip->pin_info = kzalloc(size, GFP_KERNEL);
2340 if (!ip->pin_info)
2341 return -ENOMEM;
2342
2343 if (cfg->type == IOAPIC_DOMAIN_INVALID) 2223 if (cfg->type == IOAPIC_DOMAIN_INVALID)
2344 return 0; 2224 return 0;
2345 2225
2226 init_irq_alloc_info(&info, NULL);
2227 info.type = X86_IRQ_ALLOC_TYPE_IOAPIC;
2228 info.ioapic_id = mpc_ioapic_id(ioapic);
2229 parent = irq_remapping_get_ir_irq_domain(&info);
2230 if (!parent)
2231 parent = x86_vector_domain;
2232
2346 ip->irqdomain = irq_domain_add_linear(cfg->dev, hwirqs, cfg->ops, 2233 ip->irqdomain = irq_domain_add_linear(cfg->dev, hwirqs, cfg->ops,
2347 (void *)(long)ioapic); 2234 (void *)(long)ioapic);
2348 if(!ip->irqdomain) { 2235 if (!ip->irqdomain)
2349 kfree(ip->pin_info);
2350 ip->pin_info = NULL;
2351 return -ENOMEM; 2236 return -ENOMEM;
2352 } 2237
2238 ip->irqdomain->parent = parent;
2353 2239
2354 if (cfg->type == IOAPIC_DOMAIN_LEGACY || 2240 if (cfg->type == IOAPIC_DOMAIN_LEGACY ||
2355 cfg->type == IOAPIC_DOMAIN_STRICT) 2241 cfg->type == IOAPIC_DOMAIN_STRICT)
2356 ioapic_dynirq_base = max(ioapic_dynirq_base, 2242 ioapic_dynirq_base = max(ioapic_dynirq_base,
2357 gsi_cfg->gsi_end + 1); 2243 gsi_cfg->gsi_end + 1);
2358 2244
2359 if (gsi_cfg->gsi_base == 0)
2360 irq_set_default_host(ip->irqdomain);
2361
2362 return 0; 2245 return 0;
2363} 2246}
2364 2247
@@ -2368,8 +2251,6 @@ static void ioapic_destroy_irqdomain(int idx)
2368 irq_domain_remove(ioapics[idx].irqdomain); 2251 irq_domain_remove(ioapics[idx].irqdomain);
2369 ioapics[idx].irqdomain = NULL; 2252 ioapics[idx].irqdomain = NULL;
2370 } 2253 }
2371 kfree(ioapics[idx].pin_info);
2372 ioapics[idx].pin_info = NULL;
2373} 2254}
2374 2255
2375void __init setup_IO_APIC(void) 2256void __init setup_IO_APIC(void)
@@ -2399,20 +2280,6 @@ void __init setup_IO_APIC(void)
2399 ioapic_initialized = 1; 2280 ioapic_initialized = 1;
2400} 2281}
2401 2282
2402/*
2403 * Called after all the initialization is done. If we didn't find any
2404 * APIC bugs then we can allow the modify fast path
2405 */
2406
2407static int __init io_apic_bug_finalize(void)
2408{
2409 if (sis_apic_bug == -1)
2410 sis_apic_bug = 0;
2411 return 0;
2412}
2413
2414late_initcall(io_apic_bug_finalize);
2415
2416static void resume_ioapic_id(int ioapic_idx) 2283static void resume_ioapic_id(int ioapic_idx)
2417{ 2284{
2418 unsigned long flags; 2285 unsigned long flags;
@@ -2451,20 +2318,6 @@ static int __init ioapic_init_ops(void)
2451 2318
2452device_initcall(ioapic_init_ops); 2319device_initcall(ioapic_init_ops);
2453 2320
2454static int
2455io_apic_setup_irq_pin(unsigned int irq, int node, struct io_apic_irq_attr *attr)
2456{
2457 struct irq_cfg *cfg = alloc_irq_and_cfg_at(irq, node);
2458 int ret;
2459
2460 if (!cfg)
2461 return -EINVAL;
2462 ret = __add_pin_to_irq_node(cfg, node, attr->ioapic, attr->ioapic_pin);
2463 if (!ret)
2464 setup_ioapic_irq(irq, cfg, attr);
2465 return ret;
2466}
2467
2468static int io_apic_get_redir_entries(int ioapic) 2321static int io_apic_get_redir_entries(int ioapic)
2469{ 2322{
2470 union IO_APIC_reg_01 reg_01; 2323 union IO_APIC_reg_01 reg_01;
@@ -2692,7 +2545,7 @@ void __init setup_ioapic_dest(void)
2692 else 2545 else
2693 mask = apic->target_cpus(); 2546 mask = apic->target_cpus();
2694 2547
2695 x86_io_apic_ops.set_affinity(idata, mask, false); 2548 irq_set_affinity(irq, mask);
2696 } 2549 }
2697 2550
2698} 2551}
@@ -2737,7 +2590,7 @@ static struct resource * __init ioapic_setup_resources(void)
2737 return res; 2590 return res;
2738} 2591}
2739 2592
2740void __init native_io_apic_init_mappings(void) 2593void __init io_apic_init_mappings(void)
2741{ 2594{
2742 unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0; 2595 unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
2743 struct resource *ioapic_res; 2596 struct resource *ioapic_res;
@@ -2962,7 +2815,6 @@ int mp_unregister_ioapic(u32 gsi_base)
2962{ 2815{
2963 int ioapic, pin; 2816 int ioapic, pin;
2964 int found = 0; 2817 int found = 0;
2965 struct mp_pin_info *pin_info;
2966 2818
2967 for_each_ioapic(ioapic) 2819 for_each_ioapic(ioapic)
2968 if (ioapics[ioapic].gsi_config.gsi_base == gsi_base) { 2820 if (ioapics[ioapic].gsi_config.gsi_base == gsi_base) {
@@ -2975,11 +2827,17 @@ int mp_unregister_ioapic(u32 gsi_base)
2975 } 2827 }
2976 2828
2977 for_each_pin(ioapic, pin) { 2829 for_each_pin(ioapic, pin) {
2978 pin_info = mp_pin_info(ioapic, pin); 2830 u32 gsi = mp_pin_to_gsi(ioapic, pin);
2979 if (pin_info->count) { 2831 int irq = mp_map_gsi_to_irq(gsi, 0, NULL);
2980 pr_warn("pin%d on IOAPIC%d is still in use.\n", 2832 struct mp_chip_data *data;
2981 pin, ioapic); 2833
2982 return -EBUSY; 2834 if (irq >= 0) {
2835 data = irq_get_chip_data(irq);
2836 if (data && data->count) {
2837 pr_warn("pin%d on IOAPIC%d is still in use.\n",
2838 pin, ioapic);
2839 return -EBUSY;
2840 }
2983 } 2841 }
2984 } 2842 }
2985 2843
@@ -3006,108 +2864,141 @@ int mp_ioapic_registered(u32 gsi_base)
3006 return 0; 2864 return 0;
3007} 2865}
3008 2866
3009static inline void set_io_apic_irq_attr(struct io_apic_irq_attr *irq_attr, 2867static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data,
3010 int ioapic, int ioapic_pin, 2868 struct irq_alloc_info *info)
3011 int trigger, int polarity)
3012{ 2869{
3013 irq_attr->ioapic = ioapic; 2870 if (info && info->ioapic_valid) {
3014 irq_attr->ioapic_pin = ioapic_pin; 2871 data->trigger = info->ioapic_trigger;
3015 irq_attr->trigger = trigger; 2872 data->polarity = info->ioapic_polarity;
3016 irq_attr->polarity = polarity; 2873 } else if (acpi_get_override_irq(gsi, &data->trigger,
2874 &data->polarity) < 0) {
2875 /* PCI interrupts are always active low level triggered. */
2876 data->trigger = IOAPIC_LEVEL;
2877 data->polarity = IOAPIC_POL_LOW;
2878 }
3017} 2879}
3018 2880
3019int mp_irqdomain_map(struct irq_domain *domain, unsigned int virq, 2881static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
3020 irq_hw_number_t hwirq) 2882 struct IO_APIC_route_entry *entry)
3021{ 2883{
3022 int ioapic = (int)(long)domain->host_data; 2884 memset(entry, 0, sizeof(*entry));
3023 struct mp_pin_info *info = mp_pin_info(ioapic, hwirq); 2885 entry->delivery_mode = apic->irq_delivery_mode;
3024 struct io_apic_irq_attr attr; 2886 entry->dest_mode = apic->irq_dest_mode;
2887 entry->dest = cfg->dest_apicid;
2888 entry->vector = cfg->vector;
2889 entry->trigger = data->trigger;
2890 entry->polarity = data->polarity;
2891 /*
2892 * Mask level triggered irqs. Edge triggered irqs are masked
2893 * by the irq core code in case they fire.
2894 */
2895 if (data->trigger == IOAPIC_LEVEL)
2896 entry->mask = IOAPIC_MASKED;
2897 else
2898 entry->mask = IOAPIC_UNMASKED;
2899}
3025 2900
3026 /* Get default attribute if not set by caller yet */ 2901int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
3027 if (!info->set) { 2902 unsigned int nr_irqs, void *arg)
3028 u32 gsi = mp_pin_to_gsi(ioapic, hwirq); 2903{
2904 int ret, ioapic, pin;
2905 struct irq_cfg *cfg;
2906 struct irq_data *irq_data;
2907 struct mp_chip_data *data;
2908 struct irq_alloc_info *info = arg;
3029 2909
3030 if (acpi_get_override_irq(gsi, &info->trigger, 2910 if (!info || nr_irqs > 1)
3031 &info->polarity) < 0) { 2911 return -EINVAL;
3032 /* 2912 irq_data = irq_domain_get_irq_data(domain, virq);
3033 * PCI interrupts are always polarity one level 2913 if (!irq_data)
3034 * triggered. 2914 return -EINVAL;
3035 */
3036 info->trigger = 1;
3037 info->polarity = 1;
3038 }
3039 info->node = NUMA_NO_NODE;
3040 2915
3041 /* 2916 ioapic = mp_irqdomain_ioapic_idx(domain);
3042 * setup_IO_APIC_irqs() programs all legacy IRQs with default 2917 pin = info->ioapic_pin;
3043 * trigger and polarity attributes. Don't set the flag for that 2918 if (irq_find_mapping(domain, (irq_hw_number_t)pin) > 0)
3044 * case so the first legacy IRQ user could reprogram the pin 2919 return -EEXIST;
3045 * with real trigger and polarity attributes.
3046 */
3047 if (virq >= nr_legacy_irqs() || info->count)
3048 info->set = 1;
3049 }
3050 set_io_apic_irq_attr(&attr, ioapic, hwirq, info->trigger,
3051 info->polarity);
3052 2920
3053 return io_apic_setup_irq_pin(virq, info->node, &attr); 2921 data = kzalloc(sizeof(*data), GFP_KERNEL);
3054} 2922 if (!data)
2923 return -ENOMEM;
3055 2924
3056void mp_irqdomain_unmap(struct irq_domain *domain, unsigned int virq) 2925 info->ioapic_entry = &data->entry;
3057{ 2926 ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
3058 struct irq_data *data = irq_get_irq_data(virq); 2927 if (ret < 0) {
3059 struct irq_cfg *cfg = irq_cfg(virq); 2928 kfree(data);
3060 int ioapic = (int)(long)domain->host_data; 2929 return ret;
3061 int pin = (int)data->hwirq; 2930 }
2931
2932 INIT_LIST_HEAD(&data->irq_2_pin);
2933 irq_data->hwirq = info->ioapic_pin;
2934 irq_data->chip = (domain->parent == x86_vector_domain) ?
2935 &ioapic_chip : &ioapic_ir_chip;
2936 irq_data->chip_data = data;
2937 mp_irqdomain_get_attr(mp_pin_to_gsi(ioapic, pin), data, info);
2938
2939 cfg = irqd_cfg(irq_data);
2940 add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin);
2941 if (info->ioapic_entry)
2942 mp_setup_entry(cfg, data, info->ioapic_entry);
2943 mp_register_handler(virq, data->trigger);
2944 if (virq < nr_legacy_irqs())
2945 legacy_pic->mask(virq);
2946
2947 apic_printk(APIC_VERBOSE, KERN_DEBUG
2948 "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i Dest:%d)\n",
2949 ioapic, mpc_ioapic_id(ioapic), pin, cfg->vector,
2950 virq, data->trigger, data->polarity, cfg->dest_apicid);
3062 2951
3063 ioapic_mask_entry(ioapic, pin); 2952 return 0;
3064 __remove_pin_from_irq(cfg, ioapic, pin);
3065 WARN_ON(!list_empty(&cfg->irq_2_pin));
3066 arch_teardown_hwirq(virq);
3067} 2953}
3068 2954
3069int mp_set_gsi_attr(u32 gsi, int trigger, int polarity, int node) 2955void mp_irqdomain_free(struct irq_domain *domain, unsigned int virq,
2956 unsigned int nr_irqs)
3070{ 2957{
3071 int ret = 0; 2958 struct irq_data *irq_data;
3072 int ioapic, pin; 2959 struct mp_chip_data *data;
3073 struct mp_pin_info *info;
3074 2960
3075 ioapic = mp_find_ioapic(gsi); 2961 BUG_ON(nr_irqs != 1);
3076 if (ioapic < 0) 2962 irq_data = irq_domain_get_irq_data(domain, virq);
3077 return -ENODEV; 2963 if (irq_data && irq_data->chip_data) {
3078 2964 data = irq_data->chip_data;
3079 pin = mp_find_ioapic_pin(ioapic, gsi); 2965 __remove_pin_from_irq(data, mp_irqdomain_ioapic_idx(domain),
3080 info = mp_pin_info(ioapic, pin); 2966 (int)irq_data->hwirq);
3081 trigger = trigger ? 1 : 0; 2967 WARN_ON(!list_empty(&data->irq_2_pin));
3082 polarity = polarity ? 1 : 0; 2968 kfree(irq_data->chip_data);
3083
3084 mutex_lock(&ioapic_mutex);
3085 if (!info->set) {
3086 info->trigger = trigger;
3087 info->polarity = polarity;
3088 info->node = node;
3089 info->set = 1;
3090 } else if (info->trigger != trigger || info->polarity != polarity) {
3091 ret = -EBUSY;
3092 } 2969 }
3093 mutex_unlock(&ioapic_mutex); 2970 irq_domain_free_irqs_top(domain, virq, nr_irqs);
3094
3095 return ret;
3096} 2971}
3097 2972
3098/* Enable IOAPIC early just for system timer */ 2973void mp_irqdomain_activate(struct irq_domain *domain,
3099void __init pre_init_apic_IRQ0(void) 2974 struct irq_data *irq_data)
3100{ 2975{
3101 struct io_apic_irq_attr attr = { 0, 0, 0, 0 }; 2976 unsigned long flags;
2977 struct irq_pin_list *entry;
2978 struct mp_chip_data *data = irq_data->chip_data;
3102 2979
3103 printk(KERN_INFO "Early APIC setup for system timer0\n"); 2980 raw_spin_lock_irqsave(&ioapic_lock, flags);
3104#ifndef CONFIG_SMP 2981 for_each_irq_pin(entry, data->irq_2_pin)
3105 physid_set_mask_of_physid(boot_cpu_physical_apicid, 2982 __ioapic_write_entry(entry->apic, entry->pin, data->entry);
3106 &phys_cpu_present_map); 2983 raw_spin_unlock_irqrestore(&ioapic_lock, flags);
3107#endif 2984}
3108 setup_local_APIC();
3109 2985
3110 io_apic_setup_irq_pin(0, 0, &attr); 2986void mp_irqdomain_deactivate(struct irq_domain *domain,
3111 irq_set_chip_and_handler_name(0, &ioapic_chip, handle_edge_irq, 2987 struct irq_data *irq_data)
3112 "edge"); 2988{
2989 /* It won't be called for IRQ with multiple IOAPIC pins associated */
2990 ioapic_mask_entry(mp_irqdomain_ioapic_idx(domain),
2991 (int)irq_data->hwirq);
2992}
2993
2994int mp_irqdomain_ioapic_idx(struct irq_domain *domain)
2995{
2996 return (int)(long)domain->host_data;
3113} 2997}
2998
2999const struct irq_domain_ops mp_ioapic_irqdomain_ops = {
3000 .alloc = mp_irqdomain_alloc,
3001 .free = mp_irqdomain_free,
3002 .activate = mp_irqdomain_activate,
3003 .deactivate = mp_irqdomain_deactivate,
3004};
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index d6ba2d660dc5..1a9d735e09c6 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -3,6 +3,8 @@
3 * 3 *
4 * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo 4 * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo
5 * Moved from arch/x86/kernel/apic/io_apic.c. 5 * Moved from arch/x86/kernel/apic/io_apic.c.
6 * Jiang Liu <jiang.liu@linux.intel.com>
7 * Convert to hierarchical irqdomain
6 * 8 *
7 * This program is free software; you can redistribute it and/or modify 9 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 10 * it under the terms of the GNU General Public License version 2 as
@@ -14,22 +16,23 @@
14#include <linux/dmar.h> 16#include <linux/dmar.h>
15#include <linux/hpet.h> 17#include <linux/hpet.h>
16#include <linux/msi.h> 18#include <linux/msi.h>
19#include <asm/irqdomain.h>
17#include <asm/msidef.h> 20#include <asm/msidef.h>
18#include <asm/hpet.h> 21#include <asm/hpet.h>
19#include <asm/hw_irq.h> 22#include <asm/hw_irq.h>
20#include <asm/apic.h> 23#include <asm/apic.h>
21#include <asm/irq_remapping.h> 24#include <asm/irq_remapping.h>
22 25
23void native_compose_msi_msg(struct pci_dev *pdev, 26static struct irq_domain *msi_default_domain;
24 unsigned int irq, unsigned int dest, 27
25 struct msi_msg *msg, u8 hpet_id) 28static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
26{ 29{
27 struct irq_cfg *cfg = irq_cfg(irq); 30 struct irq_cfg *cfg = irqd_cfg(data);
28 31
29 msg->address_hi = MSI_ADDR_BASE_HI; 32 msg->address_hi = MSI_ADDR_BASE_HI;
30 33
31 if (x2apic_enabled()) 34 if (x2apic_enabled())
32 msg->address_hi |= MSI_ADDR_EXT_DEST_ID(dest); 35 msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
33 36
34 msg->address_lo = 37 msg->address_lo =
35 MSI_ADDR_BASE_LO | 38 MSI_ADDR_BASE_LO |
@@ -39,7 +42,7 @@ void native_compose_msi_msg(struct pci_dev *pdev,
39 ((apic->irq_delivery_mode != dest_LowestPrio) ? 42 ((apic->irq_delivery_mode != dest_LowestPrio) ?
40 MSI_ADDR_REDIRECTION_CPU : 43 MSI_ADDR_REDIRECTION_CPU :
41 MSI_ADDR_REDIRECTION_LOWPRI) | 44 MSI_ADDR_REDIRECTION_LOWPRI) |
42 MSI_ADDR_DEST_ID(dest); 45 MSI_ADDR_DEST_ID(cfg->dest_apicid);
43 46
44 msg->data = 47 msg->data =
45 MSI_DATA_TRIGGER_EDGE | 48 MSI_DATA_TRIGGER_EDGE |
@@ -50,180 +53,201 @@ void native_compose_msi_msg(struct pci_dev *pdev,
50 MSI_DATA_VECTOR(cfg->vector); 53 MSI_DATA_VECTOR(cfg->vector);
51} 54}
52 55
53static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq, 56/*
54 struct msi_msg *msg, u8 hpet_id) 57 * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
58 * which implement the MSI or MSI-X Capability Structure.
59 */
60static struct irq_chip pci_msi_controller = {
61 .name = "PCI-MSI",
62 .irq_unmask = pci_msi_unmask_irq,
63 .irq_mask = pci_msi_mask_irq,
64 .irq_ack = irq_chip_ack_parent,
65 .irq_retrigger = irq_chip_retrigger_hierarchy,
66 .irq_compose_msi_msg = irq_msi_compose_msg,
67 .flags = IRQCHIP_SKIP_SET_WAKE,
68};
69
70int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
55{ 71{
56 struct irq_cfg *cfg; 72 struct irq_domain *domain;
57 int err; 73 struct irq_alloc_info info;
58 unsigned dest;
59 74
60 if (disable_apic) 75 init_irq_alloc_info(&info, NULL);
61 return -ENXIO; 76 info.type = X86_IRQ_ALLOC_TYPE_MSI;
77 info.msi_dev = dev;
62 78
63 cfg = irq_cfg(irq); 79 domain = irq_remapping_get_irq_domain(&info);
64 err = assign_irq_vector(irq, cfg, apic->target_cpus()); 80 if (domain == NULL)
65 if (err) 81 domain = msi_default_domain;
66 return err; 82 if (domain == NULL)
83 return -ENOSYS;
67 84
68 err = apic->cpu_mask_to_apicid_and(cfg->domain, 85 return pci_msi_domain_alloc_irqs(domain, dev, nvec, type);
69 apic->target_cpus(), &dest); 86}
70 if (err)
71 return err;
72 87
73 x86_msi.compose_msi_msg(pdev, irq, dest, msg, hpet_id); 88void native_teardown_msi_irq(unsigned int irq)
89{
90 irq_domain_free_irqs(irq, 1);
91}
74 92
75 return 0; 93static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
94 msi_alloc_info_t *arg)
95{
96 return arg->msi_hwirq;
76} 97}
77 98
78static int 99static int pci_msi_prepare(struct irq_domain *domain, struct device *dev,
79msi_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force) 100 int nvec, msi_alloc_info_t *arg)
80{ 101{
81 struct irq_cfg *cfg = irqd_cfg(data); 102 struct pci_dev *pdev = to_pci_dev(dev);
82 struct msi_msg msg; 103 struct msi_desc *desc = first_pci_msi_entry(pdev);
83 unsigned int dest; 104
84 int ret; 105 init_irq_alloc_info(arg, NULL);
106 arg->msi_dev = pdev;
107 if (desc->msi_attrib.is_msix) {
108 arg->type = X86_IRQ_ALLOC_TYPE_MSIX;
109 } else {
110 arg->type = X86_IRQ_ALLOC_TYPE_MSI;
111 arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
112 }
85 113
86 ret = apic_set_affinity(data, mask, &dest); 114 return 0;
87 if (ret) 115}
88 return ret;
89 116
90 __get_cached_msi_msg(data->msi_desc, &msg); 117static void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
118{
119 arg->msi_hwirq = pci_msi_domain_calc_hwirq(arg->msi_dev, desc);
120}
121
122static struct msi_domain_ops pci_msi_domain_ops = {
123 .get_hwirq = pci_msi_get_hwirq,
124 .msi_prepare = pci_msi_prepare,
125 .set_desc = pci_msi_set_desc,
126};
91 127
92 msg.data &= ~MSI_DATA_VECTOR_MASK; 128static struct msi_domain_info pci_msi_domain_info = {
93 msg.data |= MSI_DATA_VECTOR(cfg->vector); 129 .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
94 msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; 130 MSI_FLAG_PCI_MSIX,
95 msg.address_lo |= MSI_ADDR_DEST_ID(dest); 131 .ops = &pci_msi_domain_ops,
132 .chip = &pci_msi_controller,
133 .handler = handle_edge_irq,
134 .handler_name = "edge",
135};
96 136
97 __pci_write_msi_msg(data->msi_desc, &msg); 137void arch_init_msi_domain(struct irq_domain *parent)
138{
139 if (disable_apic)
140 return;
98 141
99 return IRQ_SET_MASK_OK_NOCOPY; 142 msi_default_domain = pci_msi_create_irq_domain(NULL,
143 &pci_msi_domain_info, parent);
144 if (!msi_default_domain)
145 pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
100} 146}
101 147
102/* 148#ifdef CONFIG_IRQ_REMAP
103 * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices, 149static struct irq_chip pci_msi_ir_controller = {
104 * which implement the MSI or MSI-X Capability Structure. 150 .name = "IR-PCI-MSI",
105 */
106static struct irq_chip msi_chip = {
107 .name = "PCI-MSI",
108 .irq_unmask = pci_msi_unmask_irq, 151 .irq_unmask = pci_msi_unmask_irq,
109 .irq_mask = pci_msi_mask_irq, 152 .irq_mask = pci_msi_mask_irq,
110 .irq_ack = apic_ack_edge, 153 .irq_ack = irq_chip_ack_parent,
111 .irq_set_affinity = msi_set_affinity, 154 .irq_retrigger = irq_chip_retrigger_hierarchy,
112 .irq_retrigger = apic_retrigger_irq, 155 .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent,
113 .flags = IRQCHIP_SKIP_SET_WAKE, 156 .flags = IRQCHIP_SKIP_SET_WAKE,
114}; 157};
115 158
116int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, 159static struct msi_domain_info pci_msi_ir_domain_info = {
117 unsigned int irq_base, unsigned int irq_offset) 160 .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
118{ 161 MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
119 struct irq_chip *chip = &msi_chip; 162 .ops = &pci_msi_domain_ops,
120 struct msi_msg msg; 163 .chip = &pci_msi_ir_controller,
121 unsigned int irq = irq_base + irq_offset; 164 .handler = handle_edge_irq,
122 int ret; 165 .handler_name = "edge",
123 166};
124 ret = msi_compose_msg(dev, irq, &msg, -1);
125 if (ret < 0)
126 return ret;
127
128 irq_set_msi_desc_off(irq_base, irq_offset, msidesc);
129
130 /*
131 * MSI-X message is written per-IRQ, the offset is always 0.
132 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
133 */
134 if (!irq_offset)
135 pci_write_msi_msg(irq, &msg);
136 167
137 setup_remapped_irq(irq, irq_cfg(irq), chip); 168struct irq_domain *arch_create_msi_irq_domain(struct irq_domain *parent)
169{
170 return pci_msi_create_irq_domain(NULL, &pci_msi_ir_domain_info, parent);
171}
172#endif
138 173
139 irq_set_chip_and_handler_name(irq, chip, handle_edge_irq, "edge"); 174#ifdef CONFIG_DMAR_TABLE
175static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
176{
177 dmar_msi_write(data->irq, msg);
178}
140 179
141 dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", irq); 180static struct irq_chip dmar_msi_controller = {
181 .name = "DMAR-MSI",
182 .irq_unmask = dmar_msi_unmask,
183 .irq_mask = dmar_msi_mask,
184 .irq_ack = irq_chip_ack_parent,
185 .irq_set_affinity = msi_domain_set_affinity,
186 .irq_retrigger = irq_chip_retrigger_hierarchy,
187 .irq_compose_msi_msg = irq_msi_compose_msg,
188 .irq_write_msi_msg = dmar_msi_write_msg,
189 .flags = IRQCHIP_SKIP_SET_WAKE,
190};
142 191
143 return 0; 192static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
193 msi_alloc_info_t *arg)
194{
195 return arg->dmar_id;
144} 196}
145 197
146int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) 198static int dmar_msi_init(struct irq_domain *domain,
199 struct msi_domain_info *info, unsigned int virq,
200 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
147{ 201{
148 struct msi_desc *msidesc; 202 irq_domain_set_info(domain, virq, arg->dmar_id, info->chip, NULL,
149 unsigned int irq; 203 handle_edge_irq, arg->dmar_data, "edge");
150 int node, ret;
151 204
152 /* Multiple MSI vectors only supported with interrupt remapping */ 205 return 0;
153 if (type == PCI_CAP_ID_MSI && nvec > 1) 206}
154 return 1;
155 207
156 node = dev_to_node(&dev->dev); 208static struct msi_domain_ops dmar_msi_domain_ops = {
209 .get_hwirq = dmar_msi_get_hwirq,
210 .msi_init = dmar_msi_init,
211};
157 212
158 list_for_each_entry(msidesc, &dev->msi_list, list) { 213static struct msi_domain_info dmar_msi_domain_info = {
159 irq = irq_alloc_hwirq(node); 214 .ops = &dmar_msi_domain_ops,
160 if (!irq) 215 .chip = &dmar_msi_controller,
161 return -ENOSPC; 216};
162 217
163 ret = setup_msi_irq(dev, msidesc, irq, 0); 218static struct irq_domain *dmar_get_irq_domain(void)
164 if (ret < 0) { 219{
165 irq_free_hwirq(irq); 220 static struct irq_domain *dmar_domain;
166 return ret; 221 static DEFINE_MUTEX(dmar_lock);
167 }
168 222
169 } 223 mutex_lock(&dmar_lock);
170 return 0; 224 if (dmar_domain == NULL)
171} 225 dmar_domain = msi_create_irq_domain(NULL, &dmar_msi_domain_info,
226 x86_vector_domain);
227 mutex_unlock(&dmar_lock);
172 228
173void native_teardown_msi_irq(unsigned int irq) 229 return dmar_domain;
174{
175 irq_free_hwirq(irq);
176} 230}
177 231
178#ifdef CONFIG_DMAR_TABLE 232int dmar_alloc_hwirq(int id, int node, void *arg)
179static int
180dmar_msi_set_affinity(struct irq_data *data, const struct cpumask *mask,
181 bool force)
182{ 233{
183 struct irq_cfg *cfg = irqd_cfg(data); 234 struct irq_domain *domain = dmar_get_irq_domain();
184 unsigned int dest, irq = data->irq; 235 struct irq_alloc_info info;
185 struct msi_msg msg;
186 int ret;
187
188 ret = apic_set_affinity(data, mask, &dest);
189 if (ret)
190 return ret;
191 236
192 dmar_msi_read(irq, &msg); 237 if (!domain)
238 return -1;
193 239
194 msg.data &= ~MSI_DATA_VECTOR_MASK; 240 init_irq_alloc_info(&info, NULL);
195 msg.data |= MSI_DATA_VECTOR(cfg->vector); 241 info.type = X86_IRQ_ALLOC_TYPE_DMAR;
196 msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; 242 info.dmar_id = id;
197 msg.address_lo |= MSI_ADDR_DEST_ID(dest); 243 info.dmar_data = arg;
198 msg.address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(dest);
199 244
200 dmar_msi_write(irq, &msg); 245 return irq_domain_alloc_irqs(domain, 1, node, &info);
201
202 return IRQ_SET_MASK_OK_NOCOPY;
203} 246}
204 247
205static struct irq_chip dmar_msi_type = { 248void dmar_free_hwirq(int irq)
206 .name = "DMAR_MSI",
207 .irq_unmask = dmar_msi_unmask,
208 .irq_mask = dmar_msi_mask,
209 .irq_ack = apic_ack_edge,
210 .irq_set_affinity = dmar_msi_set_affinity,
211 .irq_retrigger = apic_retrigger_irq,
212 .flags = IRQCHIP_SKIP_SET_WAKE,
213};
214
215int arch_setup_dmar_msi(unsigned int irq)
216{ 249{
217 int ret; 250 irq_domain_free_irqs(irq, 1);
218 struct msi_msg msg;
219
220 ret = msi_compose_msg(NULL, irq, &msg, -1);
221 if (ret < 0)
222 return ret;
223 dmar_msi_write(irq, &msg);
224 irq_set_chip_and_handler_name(irq, &dmar_msi_type, handle_edge_irq,
225 "edge");
226 return 0;
227} 251}
228#endif 252#endif
229 253
@@ -231,56 +255,103 @@ int arch_setup_dmar_msi(unsigned int irq)
231 * MSI message composition 255 * MSI message composition
232 */ 256 */
233#ifdef CONFIG_HPET_TIMER 257#ifdef CONFIG_HPET_TIMER
258static inline int hpet_dev_id(struct irq_domain *domain)
259{
260 struct msi_domain_info *info = msi_get_domain_info(domain);
261
262 return (int)(long)info->data;
263}
234 264
235static int hpet_msi_set_affinity(struct irq_data *data, 265static void hpet_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
236 const struct cpumask *mask, bool force)
237{ 266{
238 struct irq_cfg *cfg = irqd_cfg(data); 267 hpet_msi_write(data->handler_data, msg);
239 struct msi_msg msg; 268}
240 unsigned int dest;
241 int ret;
242 269
243 ret = apic_set_affinity(data, mask, &dest); 270static struct irq_chip hpet_msi_controller = {
244 if (ret) 271 .name = "HPET-MSI",
245 return ret; 272 .irq_unmask = hpet_msi_unmask,
273 .irq_mask = hpet_msi_mask,
274 .irq_ack = irq_chip_ack_parent,
275 .irq_set_affinity = msi_domain_set_affinity,
276 .irq_retrigger = irq_chip_retrigger_hierarchy,
277 .irq_compose_msi_msg = irq_msi_compose_msg,
278 .irq_write_msi_msg = hpet_msi_write_msg,
279 .flags = IRQCHIP_SKIP_SET_WAKE,
280};
246 281
247 hpet_msi_read(data->handler_data, &msg); 282static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
283 msi_alloc_info_t *arg)
284{
285 return arg->hpet_index;
286}
248 287
249 msg.data &= ~MSI_DATA_VECTOR_MASK; 288static int hpet_msi_init(struct irq_domain *domain,
250 msg.data |= MSI_DATA_VECTOR(cfg->vector); 289 struct msi_domain_info *info, unsigned int virq,
251 msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; 290 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
252 msg.address_lo |= MSI_ADDR_DEST_ID(dest); 291{
292 irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
293 irq_domain_set_info(domain, virq, arg->hpet_index, info->chip, NULL,
294 handle_edge_irq, arg->hpet_data, "edge");
253 295
254 hpet_msi_write(data->handler_data, &msg); 296 return 0;
297}
255 298
256 return IRQ_SET_MASK_OK_NOCOPY; 299static void hpet_msi_free(struct irq_domain *domain,
300 struct msi_domain_info *info, unsigned int virq)
301{
302 irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
257} 303}
258 304
259static struct irq_chip hpet_msi_type = { 305static struct msi_domain_ops hpet_msi_domain_ops = {
260 .name = "HPET_MSI", 306 .get_hwirq = hpet_msi_get_hwirq,
261 .irq_unmask = hpet_msi_unmask, 307 .msi_init = hpet_msi_init,
262 .irq_mask = hpet_msi_mask, 308 .msi_free = hpet_msi_free,
263 .irq_ack = apic_ack_edge, 309};
264 .irq_set_affinity = hpet_msi_set_affinity, 310
265 .irq_retrigger = apic_retrigger_irq, 311static struct msi_domain_info hpet_msi_domain_info = {
266 .flags = IRQCHIP_SKIP_SET_WAKE, 312 .ops = &hpet_msi_domain_ops,
313 .chip = &hpet_msi_controller,
267}; 314};
268 315
269int default_setup_hpet_msi(unsigned int irq, unsigned int id) 316struct irq_domain *hpet_create_irq_domain(int hpet_id)
270{ 317{
271 struct irq_chip *chip = &hpet_msi_type; 318 struct irq_domain *parent;
272 struct msi_msg msg; 319 struct irq_alloc_info info;
273 int ret; 320 struct msi_domain_info *domain_info;
321
322 if (x86_vector_domain == NULL)
323 return NULL;
324
325 domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
326 if (!domain_info)
327 return NULL;
328
329 *domain_info = hpet_msi_domain_info;
330 domain_info->data = (void *)(long)hpet_id;
331
332 init_irq_alloc_info(&info, NULL);
333 info.type = X86_IRQ_ALLOC_TYPE_HPET;
334 info.hpet_id = hpet_id;
335 parent = irq_remapping_get_ir_irq_domain(&info);
336 if (parent == NULL)
337 parent = x86_vector_domain;
338 else
339 hpet_msi_controller.name = "IR-HPET-MSI";
340
341 return msi_create_irq_domain(NULL, domain_info, parent);
342}
274 343
275 ret = msi_compose_msg(NULL, irq, &msg, id); 344int hpet_assign_irq(struct irq_domain *domain, struct hpet_dev *dev,
276 if (ret < 0) 345 int dev_num)
277 return ret; 346{
347 struct irq_alloc_info info;
278 348
279 hpet_msi_write(irq_get_handler_data(irq), &msg); 349 init_irq_alloc_info(&info, NULL);
280 irq_set_status_flags(irq, IRQ_MOVE_PCNTXT); 350 info.type = X86_IRQ_ALLOC_TYPE_HPET;
281 setup_remapped_irq(irq, irq_cfg(irq), chip); 351 info.hpet_data = dev;
352 info.hpet_id = hpet_dev_id(domain);
353 info.hpet_index = dev_num;
282 354
283 irq_set_chip_and_handler_name(irq, chip, handle_edge_irq, "edge"); 355 return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
284 return 0;
285} 356}
286#endif 357#endif
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6cedd7914581..28eba2d38b15 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -3,6 +3,8 @@
3 * 3 *
4 * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo 4 * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo
5 * Moved from arch/x86/kernel/apic/io_apic.c. 5 * Moved from arch/x86/kernel/apic/io_apic.c.
6 * Jiang Liu <jiang.liu@linux.intel.com>
7 * Enable support of hierarchical irqdomains
6 * 8 *
7 * This program is free software; you can redistribute it and/or modify 9 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 10 * it under the terms of the GNU General Public License version 2 as
@@ -11,15 +13,28 @@
11#include <linux/interrupt.h> 13#include <linux/interrupt.h>
12#include <linux/init.h> 14#include <linux/init.h>
13#include <linux/compiler.h> 15#include <linux/compiler.h>
14#include <linux/irqdomain.h>
15#include <linux/slab.h> 16#include <linux/slab.h>
17#include <asm/irqdomain.h>
16#include <asm/hw_irq.h> 18#include <asm/hw_irq.h>
17#include <asm/apic.h> 19#include <asm/apic.h>
18#include <asm/i8259.h> 20#include <asm/i8259.h>
19#include <asm/desc.h> 21#include <asm/desc.h>
20#include <asm/irq_remapping.h> 22#include <asm/irq_remapping.h>
21 23
24struct apic_chip_data {
25 struct irq_cfg cfg;
26 cpumask_var_t domain;
27 cpumask_var_t old_domain;
28 u8 move_in_progress : 1;
29};
30
31struct irq_domain *x86_vector_domain;
22static DEFINE_RAW_SPINLOCK(vector_lock); 32static DEFINE_RAW_SPINLOCK(vector_lock);
33static cpumask_var_t vector_cpumask;
34static struct irq_chip lapic_controller;
35#ifdef CONFIG_X86_IO_APIC
36static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
37#endif
23 38
24void lock_vector_lock(void) 39void lock_vector_lock(void)
25{ 40{
@@ -34,71 +49,59 @@ void unlock_vector_lock(void)
34 raw_spin_unlock(&vector_lock); 49 raw_spin_unlock(&vector_lock);
35} 50}
36 51
37struct irq_cfg *irq_cfg(unsigned int irq) 52static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data)
38{ 53{
39 return irq_get_chip_data(irq); 54 if (!irq_data)
55 return NULL;
56
57 while (irq_data->parent_data)
58 irq_data = irq_data->parent_data;
59
60 return irq_data->chip_data;
40} 61}
41 62
42struct irq_cfg *irqd_cfg(struct irq_data *irq_data) 63struct irq_cfg *irqd_cfg(struct irq_data *irq_data)
43{ 64{
44 return irq_data->chip_data; 65 struct apic_chip_data *data = apic_chip_data(irq_data);
66
67 return data ? &data->cfg : NULL;
45} 68}
46 69
47static struct irq_cfg *alloc_irq_cfg(unsigned int irq, int node) 70struct irq_cfg *irq_cfg(unsigned int irq)
48{ 71{
49 struct irq_cfg *cfg; 72 return irqd_cfg(irq_get_irq_data(irq));
73}
50 74
51 cfg = kzalloc_node(sizeof(*cfg), GFP_KERNEL, node); 75static struct apic_chip_data *alloc_apic_chip_data(int node)
52 if (!cfg) 76{
77 struct apic_chip_data *data;
78
79 data = kzalloc_node(sizeof(*data), GFP_KERNEL, node);
80 if (!data)
53 return NULL; 81 return NULL;
54 if (!zalloc_cpumask_var_node(&cfg->domain, GFP_KERNEL, node)) 82 if (!zalloc_cpumask_var_node(&data->domain, GFP_KERNEL, node))
55 goto out_cfg; 83 goto out_data;
56 if (!zalloc_cpumask_var_node(&cfg->old_domain, GFP_KERNEL, node)) 84 if (!zalloc_cpumask_var_node(&data->old_domain, GFP_KERNEL, node))
57 goto out_domain; 85 goto out_domain;
58#ifdef CONFIG_X86_IO_APIC 86 return data;
59 INIT_LIST_HEAD(&cfg->irq_2_pin);
60#endif
61 return cfg;
62out_domain: 87out_domain:
63 free_cpumask_var(cfg->domain); 88 free_cpumask_var(data->domain);
64out_cfg: 89out_data:
65 kfree(cfg); 90 kfree(data);
66 return NULL; 91 return NULL;
67} 92}
68 93
69struct irq_cfg *alloc_irq_and_cfg_at(unsigned int at, int node) 94static void free_apic_chip_data(struct apic_chip_data *data)
70{ 95{
71 int res = irq_alloc_desc_at(at, node); 96 if (data) {
72 struct irq_cfg *cfg; 97 free_cpumask_var(data->domain);
73 98 free_cpumask_var(data->old_domain);
74 if (res < 0) { 99 kfree(data);
75 if (res != -EEXIST)
76 return NULL;
77 cfg = irq_cfg(at);
78 if (cfg)
79 return cfg;
80 } 100 }
81
82 cfg = alloc_irq_cfg(at, node);
83 if (cfg)
84 irq_set_chip_data(at, cfg);
85 else
86 irq_free_desc(at);
87 return cfg;
88}
89
90static void free_irq_cfg(unsigned int at, struct irq_cfg *cfg)
91{
92 if (!cfg)
93 return;
94 irq_set_chip_data(at, NULL);
95 free_cpumask_var(cfg->domain);
96 free_cpumask_var(cfg->old_domain);
97 kfree(cfg);
98} 101}
99 102
100static int 103static int __assign_irq_vector(int irq, struct apic_chip_data *d,
101__assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) 104 const struct cpumask *mask)
102{ 105{
103 /* 106 /*
104 * NOTE! The local APIC isn't very good at handling 107 * NOTE! The local APIC isn't very good at handling
@@ -114,36 +117,33 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
114 static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START; 117 static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
115 static int current_offset = VECTOR_OFFSET_START % 16; 118 static int current_offset = VECTOR_OFFSET_START % 16;
116 int cpu, err; 119 int cpu, err;
117 cpumask_var_t tmp_mask;
118 120
119 if (cfg->move_in_progress) 121 if (d->move_in_progress)
120 return -EBUSY; 122 return -EBUSY;
121 123
122 if (!alloc_cpumask_var(&tmp_mask, GFP_ATOMIC))
123 return -ENOMEM;
124
125 /* Only try and allocate irqs on cpus that are present */ 124 /* Only try and allocate irqs on cpus that are present */
126 err = -ENOSPC; 125 err = -ENOSPC;
127 cpumask_clear(cfg->old_domain); 126 cpumask_clear(d->old_domain);
128 cpu = cpumask_first_and(mask, cpu_online_mask); 127 cpu = cpumask_first_and(mask, cpu_online_mask);
129 while (cpu < nr_cpu_ids) { 128 while (cpu < nr_cpu_ids) {
130 int new_cpu, vector, offset; 129 int new_cpu, vector, offset;
131 130
132 apic->vector_allocation_domain(cpu, tmp_mask, mask); 131 apic->vector_allocation_domain(cpu, vector_cpumask, mask);
133 132
134 if (cpumask_subset(tmp_mask, cfg->domain)) { 133 if (cpumask_subset(vector_cpumask, d->domain)) {
135 err = 0; 134 err = 0;
136 if (cpumask_equal(tmp_mask, cfg->domain)) 135 if (cpumask_equal(vector_cpumask, d->domain))
137 break; 136 break;
138 /* 137 /*
139 * New cpumask using the vector is a proper subset of 138 * New cpumask using the vector is a proper subset of
140 * the current in use mask. So cleanup the vector 139 * the current in use mask. So cleanup the vector
141 * allocation for the members that are not used anymore. 140 * allocation for the members that are not used anymore.
142 */ 141 */
143 cpumask_andnot(cfg->old_domain, cfg->domain, tmp_mask); 142 cpumask_andnot(d->old_domain, d->domain,
144 cfg->move_in_progress = 143 vector_cpumask);
145 cpumask_intersects(cfg->old_domain, cpu_online_mask); 144 d->move_in_progress =
146 cpumask_and(cfg->domain, cfg->domain, tmp_mask); 145 cpumask_intersects(d->old_domain, cpu_online_mask);
146 cpumask_and(d->domain, d->domain, vector_cpumask);
147 break; 147 break;
148 } 148 }
149 149
@@ -157,16 +157,18 @@ next:
157 } 157 }
158 158
159 if (unlikely(current_vector == vector)) { 159 if (unlikely(current_vector == vector)) {
160 cpumask_or(cfg->old_domain, cfg->old_domain, tmp_mask); 160 cpumask_or(d->old_domain, d->old_domain,
161 cpumask_andnot(tmp_mask, mask, cfg->old_domain); 161 vector_cpumask);
162 cpu = cpumask_first_and(tmp_mask, cpu_online_mask); 162 cpumask_andnot(vector_cpumask, mask, d->old_domain);
163 cpu = cpumask_first_and(vector_cpumask,
164 cpu_online_mask);
163 continue; 165 continue;
164 } 166 }
165 167
166 if (test_bit(vector, used_vectors)) 168 if (test_bit(vector, used_vectors))
167 goto next; 169 goto next;
168 170
169 for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) { 171 for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
170 if (per_cpu(vector_irq, new_cpu)[vector] > 172 if (per_cpu(vector_irq, new_cpu)[vector] >
171 VECTOR_UNDEFINED) 173 VECTOR_UNDEFINED)
172 goto next; 174 goto next;
@@ -174,55 +176,73 @@ next:
174 /* Found one! */ 176 /* Found one! */
175 current_vector = vector; 177 current_vector = vector;
176 current_offset = offset; 178 current_offset = offset;
177 if (cfg->vector) { 179 if (d->cfg.vector) {
178 cpumask_copy(cfg->old_domain, cfg->domain); 180 cpumask_copy(d->old_domain, d->domain);
179 cfg->move_in_progress = 181 d->move_in_progress =
180 cpumask_intersects(cfg->old_domain, cpu_online_mask); 182 cpumask_intersects(d->old_domain, cpu_online_mask);
181 } 183 }
182 for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask) 184 for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
183 per_cpu(vector_irq, new_cpu)[vector] = irq; 185 per_cpu(vector_irq, new_cpu)[vector] = irq;
184 cfg->vector = vector; 186 d->cfg.vector = vector;
185 cpumask_copy(cfg->domain, tmp_mask); 187 cpumask_copy(d->domain, vector_cpumask);
186 err = 0; 188 err = 0;
187 break; 189 break;
188 } 190 }
189 free_cpumask_var(tmp_mask); 191
192 if (!err) {
193 /* cache destination APIC IDs into cfg->dest_apicid */
194 err = apic->cpu_mask_to_apicid_and(mask, d->domain,
195 &d->cfg.dest_apicid);
196 }
190 197
191 return err; 198 return err;
192} 199}
193 200
194int assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) 201static int assign_irq_vector(int irq, struct apic_chip_data *data,
202 const struct cpumask *mask)
195{ 203{
196 int err; 204 int err;
197 unsigned long flags; 205 unsigned long flags;
198 206
199 raw_spin_lock_irqsave(&vector_lock, flags); 207 raw_spin_lock_irqsave(&vector_lock, flags);
200 err = __assign_irq_vector(irq, cfg, mask); 208 err = __assign_irq_vector(irq, data, mask);
201 raw_spin_unlock_irqrestore(&vector_lock, flags); 209 raw_spin_unlock_irqrestore(&vector_lock, flags);
202 return err; 210 return err;
203} 211}
204 212
205void clear_irq_vector(int irq, struct irq_cfg *cfg) 213static int assign_irq_vector_policy(int irq, int node,
214 struct apic_chip_data *data,
215 struct irq_alloc_info *info)
216{
217 if (info && info->mask)
218 return assign_irq_vector(irq, data, info->mask);
219 if (node != NUMA_NO_NODE &&
220 assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
221 return 0;
222 return assign_irq_vector(irq, data, apic->target_cpus());
223}
224
225static void clear_irq_vector(int irq, struct apic_chip_data *data)
206{ 226{
207 int cpu, vector; 227 int cpu, vector;
208 unsigned long flags; 228 unsigned long flags;
209 229
210 raw_spin_lock_irqsave(&vector_lock, flags); 230 raw_spin_lock_irqsave(&vector_lock, flags);
211 BUG_ON(!cfg->vector); 231 BUG_ON(!data->cfg.vector);
212 232
213 vector = cfg->vector; 233 vector = data->cfg.vector;
214 for_each_cpu_and(cpu, cfg->domain, cpu_online_mask) 234 for_each_cpu_and(cpu, data->domain, cpu_online_mask)
215 per_cpu(vector_irq, cpu)[vector] = VECTOR_UNDEFINED; 235 per_cpu(vector_irq, cpu)[vector] = VECTOR_UNDEFINED;
216 236
217 cfg->vector = 0; 237 data->cfg.vector = 0;
218 cpumask_clear(cfg->domain); 238 cpumask_clear(data->domain);
219 239
220 if (likely(!cfg->move_in_progress)) { 240 if (likely(!data->move_in_progress)) {
221 raw_spin_unlock_irqrestore(&vector_lock, flags); 241 raw_spin_unlock_irqrestore(&vector_lock, flags);
222 return; 242 return;
223 } 243 }
224 244
225 for_each_cpu_and(cpu, cfg->old_domain, cpu_online_mask) { 245 for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
226 for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; 246 for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS;
227 vector++) { 247 vector++) {
228 if (per_cpu(vector_irq, cpu)[vector] != irq) 248 if (per_cpu(vector_irq, cpu)[vector] != irq)
@@ -231,10 +251,95 @@ void clear_irq_vector(int irq, struct irq_cfg *cfg)
231 break; 251 break;
232 } 252 }
233 } 253 }
234 cfg->move_in_progress = 0; 254 data->move_in_progress = 0;
235 raw_spin_unlock_irqrestore(&vector_lock, flags); 255 raw_spin_unlock_irqrestore(&vector_lock, flags);
236} 256}
237 257
258void init_irq_alloc_info(struct irq_alloc_info *info,
259 const struct cpumask *mask)
260{
261 memset(info, 0, sizeof(*info));
262 info->mask = mask;
263}
264
265void copy_irq_alloc_info(struct irq_alloc_info *dst, struct irq_alloc_info *src)
266{
267 if (src)
268 *dst = *src;
269 else
270 memset(dst, 0, sizeof(*dst));
271}
272
273static void x86_vector_free_irqs(struct irq_domain *domain,
274 unsigned int virq, unsigned int nr_irqs)
275{
276 struct irq_data *irq_data;
277 int i;
278
279 for (i = 0; i < nr_irqs; i++) {
280 irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
281 if (irq_data && irq_data->chip_data) {
282 clear_irq_vector(virq + i, irq_data->chip_data);
283 free_apic_chip_data(irq_data->chip_data);
284#ifdef CONFIG_X86_IO_APIC
285 if (virq + i < nr_legacy_irqs())
286 legacy_irq_data[virq + i] = NULL;
287#endif
288 irq_domain_reset_irq_data(irq_data);
289 }
290 }
291}
292
293static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
294 unsigned int nr_irqs, void *arg)
295{
296 struct irq_alloc_info *info = arg;
297 struct apic_chip_data *data;
298 struct irq_data *irq_data;
299 int i, err;
300
301 if (disable_apic)
302 return -ENXIO;
303
304 /* Currently vector allocator can't guarantee contiguous allocations */
305 if ((info->flags & X86_IRQ_ALLOC_CONTIGUOUS_VECTORS) && nr_irqs > 1)
306 return -ENOSYS;
307
308 for (i = 0; i < nr_irqs; i++) {
309 irq_data = irq_domain_get_irq_data(domain, virq + i);
310 BUG_ON(!irq_data);
311#ifdef CONFIG_X86_IO_APIC
312 if (virq + i < nr_legacy_irqs() && legacy_irq_data[virq + i])
313 data = legacy_irq_data[virq + i];
314 else
315#endif
316 data = alloc_apic_chip_data(irq_data->node);
317 if (!data) {
318 err = -ENOMEM;
319 goto error;
320 }
321
322 irq_data->chip = &lapic_controller;
323 irq_data->chip_data = data;
324 irq_data->hwirq = virq + i;
325 err = assign_irq_vector_policy(virq, irq_data->node, data,
326 info);
327 if (err)
328 goto error;
329 }
330
331 return 0;
332
333error:
334 x86_vector_free_irqs(domain, virq, i + 1);
335 return err;
336}
337
338static const struct irq_domain_ops x86_vector_domain_ops = {
339 .alloc = x86_vector_alloc_irqs,
340 .free = x86_vector_free_irqs,
341};
342
238int __init arch_probe_nr_irqs(void) 343int __init arch_probe_nr_irqs(void)
239{ 344{
240 int nr; 345 int nr;
@@ -258,8 +363,43 @@ int __init arch_probe_nr_irqs(void)
258 return nr_legacy_irqs(); 363 return nr_legacy_irqs();
259} 364}
260 365
366#ifdef CONFIG_X86_IO_APIC
367static void init_legacy_irqs(void)
368{
369 int i, node = cpu_to_node(0);
370 struct apic_chip_data *data;
371
372 /*
373 * For legacy IRQ's, start with assigning irq0 to irq15 to
374 * ISA_IRQ_VECTOR(i) for all cpu's.
375 */
376 for (i = 0; i < nr_legacy_irqs(); i++) {
377 data = legacy_irq_data[i] = alloc_apic_chip_data(node);
378 BUG_ON(!data);
379
380 data->cfg.vector = ISA_IRQ_VECTOR(i);
381 cpumask_setall(data->domain);
382 irq_set_chip_data(i, data);
383 }
384}
385#else
386static void init_legacy_irqs(void) { }
387#endif
388
261int __init arch_early_irq_init(void) 389int __init arch_early_irq_init(void)
262{ 390{
391 init_legacy_irqs();
392
393 x86_vector_domain = irq_domain_add_tree(NULL, &x86_vector_domain_ops,
394 NULL);
395 BUG_ON(x86_vector_domain == NULL);
396 irq_set_default_host(x86_vector_domain);
397
398 arch_init_msi_domain(x86_vector_domain);
399 arch_init_htirq_domain(x86_vector_domain);
400
401 BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL));
402
263 return arch_early_ioapic_init(); 403 return arch_early_ioapic_init();
264} 404}
265 405
@@ -267,7 +407,7 @@ static void __setup_vector_irq(int cpu)
267{ 407{
268 /* Initialize vector_irq on a new cpu */ 408 /* Initialize vector_irq on a new cpu */
269 int irq, vector; 409 int irq, vector;
270 struct irq_cfg *cfg; 410 struct apic_chip_data *data;
271 411
272 /* 412 /*
273 * vector_lock will make sure that we don't run into irq vector 413 * vector_lock will make sure that we don't run into irq vector
@@ -277,13 +417,13 @@ static void __setup_vector_irq(int cpu)
277 raw_spin_lock(&vector_lock); 417 raw_spin_lock(&vector_lock);
278 /* Mark the inuse vectors */ 418 /* Mark the inuse vectors */
279 for_each_active_irq(irq) { 419 for_each_active_irq(irq) {
280 cfg = irq_cfg(irq); 420 data = apic_chip_data(irq_get_irq_data(irq));
281 if (!cfg) 421 if (!data)
282 continue; 422 continue;
283 423
284 if (!cpumask_test_cpu(cpu, cfg->domain)) 424 if (!cpumask_test_cpu(cpu, data->domain))
285 continue; 425 continue;
286 vector = cfg->vector; 426 vector = data->cfg.vector;
287 per_cpu(vector_irq, cpu)[vector] = irq; 427 per_cpu(vector_irq, cpu)[vector] = irq;
288 } 428 }
289 /* Mark the free vectors */ 429 /* Mark the free vectors */
@@ -292,8 +432,8 @@ static void __setup_vector_irq(int cpu)
292 if (irq <= VECTOR_UNDEFINED) 432 if (irq <= VECTOR_UNDEFINED)
293 continue; 433 continue;
294 434
295 cfg = irq_cfg(irq); 435 data = apic_chip_data(irq_get_irq_data(irq));
296 if (!cpumask_test_cpu(cpu, cfg->domain)) 436 if (!cpumask_test_cpu(cpu, data->domain))
297 per_cpu(vector_irq, cpu)[vector] = VECTOR_UNDEFINED; 437 per_cpu(vector_irq, cpu)[vector] = VECTOR_UNDEFINED;
298 } 438 }
299 raw_spin_unlock(&vector_lock); 439 raw_spin_unlock(&vector_lock);
@@ -314,20 +454,20 @@ void setup_vector_irq(int cpu)
314 * legacy vector to irq mapping: 454 * legacy vector to irq mapping:
315 */ 455 */
316 for (irq = 0; irq < nr_legacy_irqs(); irq++) 456 for (irq = 0; irq < nr_legacy_irqs(); irq++)
317 per_cpu(vector_irq, cpu)[IRQ0_VECTOR + irq] = irq; 457 per_cpu(vector_irq, cpu)[ISA_IRQ_VECTOR(irq)] = irq;
318 458
319 __setup_vector_irq(cpu); 459 __setup_vector_irq(cpu);
320} 460}
321 461
322int apic_retrigger_irq(struct irq_data *data) 462static int apic_retrigger_irq(struct irq_data *irq_data)
323{ 463{
324 struct irq_cfg *cfg = irqd_cfg(data); 464 struct apic_chip_data *data = apic_chip_data(irq_data);
325 unsigned long flags; 465 unsigned long flags;
326 int cpu; 466 int cpu;
327 467
328 raw_spin_lock_irqsave(&vector_lock, flags); 468 raw_spin_lock_irqsave(&vector_lock, flags);
329 cpu = cpumask_first_and(cfg->domain, cpu_online_mask); 469 cpu = cpumask_first_and(data->domain, cpu_online_mask);
330 apic->send_IPI_mask(cpumask_of(cpu), cfg->vector); 470 apic->send_IPI_mask(cpumask_of(cpu), data->cfg.vector);
331 raw_spin_unlock_irqrestore(&vector_lock, flags); 471 raw_spin_unlock_irqrestore(&vector_lock, flags);
332 472
333 return 1; 473 return 1;
@@ -340,73 +480,76 @@ void apic_ack_edge(struct irq_data *data)
340 ack_APIC_irq(); 480 ack_APIC_irq();
341} 481}
342 482
343/* 483static int apic_set_affinity(struct irq_data *irq_data,
344 * Either sets data->affinity to a valid value, and returns 484 const struct cpumask *dest, bool force)
345 * ->cpu_mask_to_apicid of that in dest_id, or returns -1 and
346 * leaves data->affinity untouched.
347 */
348int apic_set_affinity(struct irq_data *data, const struct cpumask *mask,
349 unsigned int *dest_id)
350{ 485{
351 struct irq_cfg *cfg = irqd_cfg(data); 486 struct apic_chip_data *data = irq_data->chip_data;
352 unsigned int irq = data->irq; 487 int err, irq = irq_data->irq;
353 int err;
354 488
355 if (!config_enabled(CONFIG_SMP)) 489 if (!config_enabled(CONFIG_SMP))
356 return -EPERM; 490 return -EPERM;
357 491
358 if (!cpumask_intersects(mask, cpu_online_mask)) 492 if (!cpumask_intersects(dest, cpu_online_mask))
359 return -EINVAL; 493 return -EINVAL;
360 494
361 err = assign_irq_vector(irq, cfg, mask); 495 err = assign_irq_vector(irq, data, dest);
362 if (err)
363 return err;
364
365 err = apic->cpu_mask_to_apicid_and(mask, cfg->domain, dest_id);
366 if (err) { 496 if (err) {
367 if (assign_irq_vector(irq, cfg, data->affinity)) 497 struct irq_data *top = irq_get_irq_data(irq);
498
499 if (assign_irq_vector(irq, data, top->affinity))
368 pr_err("Failed to recover vector for irq %d\n", irq); 500 pr_err("Failed to recover vector for irq %d\n", irq);
369 return err; 501 return err;
370 } 502 }
371 503
372 cpumask_copy(data->affinity, mask); 504 return IRQ_SET_MASK_OK;
373
374 return 0;
375} 505}
376 506
507static struct irq_chip lapic_controller = {
508 .irq_ack = apic_ack_edge,
509 .irq_set_affinity = apic_set_affinity,
510 .irq_retrigger = apic_retrigger_irq,
511};
512
377#ifdef CONFIG_SMP 513#ifdef CONFIG_SMP
378void send_cleanup_vector(struct irq_cfg *cfg) 514static void __send_cleanup_vector(struct apic_chip_data *data)
379{ 515{
380 cpumask_var_t cleanup_mask; 516 cpumask_var_t cleanup_mask;
381 517
382 if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) { 518 if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
383 unsigned int i; 519 unsigned int i;
384 520
385 for_each_cpu_and(i, cfg->old_domain, cpu_online_mask) 521 for_each_cpu_and(i, data->old_domain, cpu_online_mask)
386 apic->send_IPI_mask(cpumask_of(i), 522 apic->send_IPI_mask(cpumask_of(i),
387 IRQ_MOVE_CLEANUP_VECTOR); 523 IRQ_MOVE_CLEANUP_VECTOR);
388 } else { 524 } else {
389 cpumask_and(cleanup_mask, cfg->old_domain, cpu_online_mask); 525 cpumask_and(cleanup_mask, data->old_domain, cpu_online_mask);
390 apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR); 526 apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
391 free_cpumask_var(cleanup_mask); 527 free_cpumask_var(cleanup_mask);
392 } 528 }
393 cfg->move_in_progress = 0; 529 data->move_in_progress = 0;
530}
531
532void send_cleanup_vector(struct irq_cfg *cfg)
533{
534 struct apic_chip_data *data;
535
536 data = container_of(cfg, struct apic_chip_data, cfg);
537 if (data->move_in_progress)
538 __send_cleanup_vector(data);
394} 539}
395 540
396asmlinkage __visible void smp_irq_move_cleanup_interrupt(void) 541asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
397{ 542{
398 unsigned vector, me; 543 unsigned vector, me;
399 544
400 ack_APIC_irq(); 545 entering_ack_irq();
401 irq_enter();
402 exit_idle();
403 546
404 me = smp_processor_id(); 547 me = smp_processor_id();
405 for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { 548 for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
406 int irq; 549 int irq;
407 unsigned int irr; 550 unsigned int irr;
408 struct irq_desc *desc; 551 struct irq_desc *desc;
409 struct irq_cfg *cfg; 552 struct apic_chip_data *data;
410 553
411 irq = __this_cpu_read(vector_irq[vector]); 554 irq = __this_cpu_read(vector_irq[vector]);
412 555
@@ -417,8 +560,8 @@ asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
417 if (!desc) 560 if (!desc)
418 continue; 561 continue;
419 562
420 cfg = irq_cfg(irq); 563 data = apic_chip_data(&desc->irq_data);
421 if (!cfg) 564 if (!data)
422 continue; 565 continue;
423 566
424 raw_spin_lock(&desc->lock); 567 raw_spin_lock(&desc->lock);
@@ -427,10 +570,11 @@ asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
427 * Check if the irq migration is in progress. If so, we 570 * Check if the irq migration is in progress. If so, we
428 * haven't received the cleanup request yet for this irq. 571 * haven't received the cleanup request yet for this irq.
429 */ 572 */
430 if (cfg->move_in_progress) 573 if (data->move_in_progress)
431 goto unlock; 574 goto unlock;
432 575
433 if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain)) 576 if (vector == data->cfg.vector &&
577 cpumask_test_cpu(me, data->domain))
434 goto unlock; 578 goto unlock;
435 579
436 irr = apic_read(APIC_IRR + (vector / 32 * 0x10)); 580 irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
@@ -450,20 +594,21 @@ unlock:
450 raw_spin_unlock(&desc->lock); 594 raw_spin_unlock(&desc->lock);
451 } 595 }
452 596
453 irq_exit(); 597 exiting_irq();
454} 598}
455 599
456static void __irq_complete_move(struct irq_cfg *cfg, unsigned vector) 600static void __irq_complete_move(struct irq_cfg *cfg, unsigned vector)
457{ 601{
458 unsigned me; 602 unsigned me;
603 struct apic_chip_data *data;
459 604
460 if (likely(!cfg->move_in_progress)) 605 data = container_of(cfg, struct apic_chip_data, cfg);
606 if (likely(!data->move_in_progress))
461 return; 607 return;
462 608
463 me = smp_processor_id(); 609 me = smp_processor_id();
464 610 if (vector == data->cfg.vector && cpumask_test_cpu(me, data->domain))
465 if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain)) 611 __send_cleanup_vector(data);
466 send_cleanup_vector(cfg);
467} 612}
468 613
469void irq_complete_move(struct irq_cfg *cfg) 614void irq_complete_move(struct irq_cfg *cfg)
@@ -475,46 +620,11 @@ void irq_force_complete_move(int irq)
475{ 620{
476 struct irq_cfg *cfg = irq_cfg(irq); 621 struct irq_cfg *cfg = irq_cfg(irq);
477 622
478 if (!cfg) 623 if (cfg)
479 return; 624 __irq_complete_move(cfg, cfg->vector);
480
481 __irq_complete_move(cfg, cfg->vector);
482} 625}
483#endif 626#endif
484 627
485/*
486 * Dynamic irq allocate and deallocation. Should be replaced by irq domains!
487 */
488int arch_setup_hwirq(unsigned int irq, int node)
489{
490 struct irq_cfg *cfg;
491 unsigned long flags;
492 int ret;
493
494 cfg = alloc_irq_cfg(irq, node);
495 if (!cfg)
496 return -ENOMEM;
497
498 raw_spin_lock_irqsave(&vector_lock, flags);
499 ret = __assign_irq_vector(irq, cfg, apic->target_cpus());
500 raw_spin_unlock_irqrestore(&vector_lock, flags);
501
502 if (!ret)
503 irq_set_chip_data(irq, cfg);
504 else
505 free_irq_cfg(irq, cfg);
506 return ret;
507}
508
509void arch_teardown_hwirq(unsigned int irq)
510{
511 struct irq_cfg *cfg = irq_cfg(irq);
512
513 free_remapped_irq(irq);
514 clear_irq_vector(irq, cfg);
515 free_irq_cfg(irq, cfg);
516}
517
518static void __init print_APIC_field(int base) 628static void __init print_APIC_field(int base)
519{ 629{
520 int i; 630 int i;
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 6fae733e9194..3ffd925655e0 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -21,11 +21,13 @@ early_param("x2apic_phys", set_x2apic_phys_mode);
21 21
22static bool x2apic_fadt_phys(void) 22static bool x2apic_fadt_phys(void)
23{ 23{
24#ifdef CONFIG_ACPI
24 if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) && 25 if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
25 (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) { 26 (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) {
26 printk(KERN_DEBUG "System requires x2apic physical mode\n"); 27 printk(KERN_DEBUG "System requires x2apic physical mode\n");
27 return true; 28 return true;
28 } 29 }
30#endif
29 return false; 31 return false;
30} 32}
31 33
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index b27f6ec90caa..8e3d22a1af94 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -68,7 +68,9 @@ void common(void) {
68 OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable); 68 OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
69 OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable); 69 OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable);
70 OFFSET(PV_CPU_iret, pv_cpu_ops, iret); 70 OFFSET(PV_CPU_iret, pv_cpu_ops, iret);
71#ifdef CONFIG_X86_32
71 OFFSET(PV_CPU_irq_enable_sysexit, pv_cpu_ops, irq_enable_sysexit); 72 OFFSET(PV_CPU_irq_enable_sysexit, pv_cpu_ops, irq_enable_sysexit);
73#endif
72 OFFSET(PV_CPU_read_cr0, pv_cpu_ops, read_cr0); 74 OFFSET(PV_CPU_read_cr0, pv_cpu_ops, read_cr0);
73 OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2); 75 OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2);
74#endif 76#endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index dcaab87da629..d8f42f902a0f 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -66,7 +66,7 @@ int main(void)
66 DEFINE(__NR_syscall_max, sizeof(syscalls_64) - 1); 66 DEFINE(__NR_syscall_max, sizeof(syscalls_64) - 1);
67 DEFINE(NR_syscalls, sizeof(syscalls_64)); 67 DEFINE(NR_syscalls, sizeof(syscalls_64));
68 68
69 DEFINE(__NR_ia32_syscall_max, sizeof(syscalls_ia32) - 1); 69 DEFINE(__NR_syscall_compat_max, sizeof(syscalls_ia32) - 1);
70 DEFINE(IA32_NR_syscalls, sizeof(syscalls_ia32)); 70 DEFINE(IA32_NR_syscalls, sizeof(syscalls_ia32));
71 71
72 return 0; 72 return 0;
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 56cae1964a81..dd3a4baffe50 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -295,7 +295,7 @@ static int nearby_node(int apicid)
295 * Assumption: Number of cores in each internal node is the same. 295 * Assumption: Number of cores in each internal node is the same.
296 * (2) AMD processors supporting compute units 296 * (2) AMD processors supporting compute units
297 */ 297 */
298#ifdef CONFIG_X86_HT 298#ifdef CONFIG_SMP
299static void amd_get_topology(struct cpuinfo_x86 *c) 299static void amd_get_topology(struct cpuinfo_x86 *c)
300{ 300{
301 u32 cores_per_cu = 1; 301 u32 cores_per_cu = 1;
@@ -348,7 +348,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
348 */ 348 */
349static void amd_detect_cmp(struct cpuinfo_x86 *c) 349static void amd_detect_cmp(struct cpuinfo_x86 *c)
350{ 350{
351#ifdef CONFIG_X86_HT 351#ifdef CONFIG_SMP
352 unsigned bits; 352 unsigned bits;
353 int cpu = smp_processor_id(); 353 int cpu = smp_processor_id();
354 354
@@ -433,7 +433,7 @@ static void srat_detect_node(struct cpuinfo_x86 *c)
433 433
434static void early_init_amd_mc(struct cpuinfo_x86 *c) 434static void early_init_amd_mc(struct cpuinfo_x86 *c)
435{ 435{
436#ifdef CONFIG_X86_HT 436#ifdef CONFIG_SMP
437 unsigned bits, ecx; 437 unsigned bits, ecx;
438 438
439 /* Multi core CPU? */ 439 /* Multi core CPU? */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b28e5262a0a5..9fc5e3d9d9c8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -491,7 +491,7 @@ static void cpu_detect_tlb(struct cpuinfo_x86 *c)
491 491
492void detect_ht(struct cpuinfo_x86 *c) 492void detect_ht(struct cpuinfo_x86 *c)
493{ 493{
494#ifdef CONFIG_X86_HT 494#ifdef CONFIG_SMP
495 u32 eax, ebx, ecx, edx; 495 u32 eax, ebx, ecx, edx;
496 int index_msb, core_bits; 496 int index_msb, core_bits;
497 static bool printed; 497 static bool printed;
@@ -827,7 +827,7 @@ static void generic_identify(struct cpuinfo_x86 *c)
827 if (c->cpuid_level >= 0x00000001) { 827 if (c->cpuid_level >= 0x00000001) {
828 c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; 828 c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
829#ifdef CONFIG_X86_32 829#ifdef CONFIG_X86_32
830# ifdef CONFIG_X86_HT 830# ifdef CONFIG_SMP
831 c->apicid = apic->phys_pkg_id(c->initial_apicid, 0); 831 c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
832# else 832# else
833 c->apicid = c->initial_apicid; 833 c->apicid = c->initial_apicid;
@@ -1009,7 +1009,7 @@ void enable_sep_cpu(void)
1009 (unsigned long)tss + offsetofend(struct tss_struct, SYSENTER_stack), 1009 (unsigned long)tss + offsetofend(struct tss_struct, SYSENTER_stack),
1010 0); 1010 0);
1011 1011
1012 wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)ia32_sysenter_target, 0); 1012 wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0);
1013 1013
1014out: 1014out:
1015 put_cpu(); 1015 put_cpu();
@@ -1138,10 +1138,6 @@ static __init int setup_disablecpuid(char *arg)
1138} 1138}
1139__setup("clearcpuid=", setup_disablecpuid); 1139__setup("clearcpuid=", setup_disablecpuid);
1140 1140
1141DEFINE_PER_CPU(unsigned long, kernel_stack) =
1142 (unsigned long)&init_thread_union + THREAD_SIZE;
1143EXPORT_PER_CPU_SYMBOL(kernel_stack);
1144
1145#ifdef CONFIG_X86_64 1141#ifdef CONFIG_X86_64
1146struct desc_ptr idt_descr = { NR_VECTORS * 16 - 1, (unsigned long) idt_table }; 1142struct desc_ptr idt_descr = { NR_VECTORS * 16 - 1, (unsigned long) idt_table };
1147struct desc_ptr debug_idt_descr = { NR_VECTORS * 16 - 1, 1143struct desc_ptr debug_idt_descr = { NR_VECTORS * 16 - 1,
@@ -1189,10 +1185,10 @@ void syscall_init(void)
1189 * set CS/DS but only a 32bit target. LSTAR sets the 64bit rip. 1185 * set CS/DS but only a 32bit target. LSTAR sets the 64bit rip.
1190 */ 1186 */
1191 wrmsrl(MSR_STAR, ((u64)__USER32_CS)<<48 | ((u64)__KERNEL_CS)<<32); 1187 wrmsrl(MSR_STAR, ((u64)__USER32_CS)<<48 | ((u64)__KERNEL_CS)<<32);
1192 wrmsrl(MSR_LSTAR, system_call); 1188 wrmsrl(MSR_LSTAR, entry_SYSCALL_64);
1193 1189
1194#ifdef CONFIG_IA32_EMULATION 1190#ifdef CONFIG_IA32_EMULATION
1195 wrmsrl(MSR_CSTAR, ia32_cstar_target); 1191 wrmsrl(MSR_CSTAR, entry_SYSCALL_compat);
1196 /* 1192 /*
1197 * This only works on Intel CPUs. 1193 * This only works on Intel CPUs.
1198 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP. 1194 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
@@ -1201,7 +1197,7 @@ void syscall_init(void)
1201 */ 1197 */
1202 wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS); 1198 wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
1203 wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL); 1199 wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
1204 wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)ia32_sysenter_target); 1200 wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
1205#else 1201#else
1206 wrmsrl(MSR_CSTAR, ignore_sysret); 1202 wrmsrl(MSR_CSTAR, ignore_sysret);
1207 wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG); 1203 wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index edcb0e28c336..be4febc58b94 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -654,7 +654,7 @@ unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c)
654 unsigned int new_l1d = 0, new_l1i = 0; /* Cache sizes from cpuid(4) */ 654 unsigned int new_l1d = 0, new_l1i = 0; /* Cache sizes from cpuid(4) */
655 unsigned int new_l2 = 0, new_l3 = 0, i; /* Cache sizes from cpuid(4) */ 655 unsigned int new_l2 = 0, new_l3 = 0, i; /* Cache sizes from cpuid(4) */
656 unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb; 656 unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
657#ifdef CONFIG_X86_HT 657#ifdef CONFIG_SMP
658 unsigned int cpu = c->cpu_index; 658 unsigned int cpu = c->cpu_index;
659#endif 659#endif
660 660
@@ -773,19 +773,19 @@ unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c)
773 773
774 if (new_l2) { 774 if (new_l2) {
775 l2 = new_l2; 775 l2 = new_l2;
776#ifdef CONFIG_X86_HT 776#ifdef CONFIG_SMP
777 per_cpu(cpu_llc_id, cpu) = l2_id; 777 per_cpu(cpu_llc_id, cpu) = l2_id;
778#endif 778#endif
779 } 779 }
780 780
781 if (new_l3) { 781 if (new_l3) {
782 l3 = new_l3; 782 l3 = new_l3;
783#ifdef CONFIG_X86_HT 783#ifdef CONFIG_SMP
784 per_cpu(cpu_llc_id, cpu) = l3_id; 784 per_cpu(cpu_llc_id, cpu) = l3_id;
785#endif 785#endif
786 } 786 }
787 787
788#ifdef CONFIG_X86_HT 788#ifdef CONFIG_SMP
789 /* 789 /*
790 * If cpu_llc_id is not yet set, this means cpuid_level < 4 which in 790 * If cpu_llc_id is not yet set, this means cpuid_level < 4 which in
791 * turns means that the only possibility is SMT (as indicated in 791 * turns means that the only possibility is SMT (as indicated in
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 95cf78d44ab4..df919ff103c3 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1053,6 +1053,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
1053 char *msg = "Unknown"; 1053 char *msg = "Unknown";
1054 u64 recover_paddr = ~0ull; 1054 u64 recover_paddr = ~0ull;
1055 int flags = MF_ACTION_REQUIRED; 1055 int flags = MF_ACTION_REQUIRED;
1056 int lmce = 0;
1056 1057
1057 prev_state = ist_enter(regs); 1058 prev_state = ist_enter(regs);
1058 1059
@@ -1080,11 +1081,20 @@ void do_machine_check(struct pt_regs *regs, long error_code)
1080 kill_it = 1; 1081 kill_it = 1;
1081 1082
1082 /* 1083 /*
1083 * Go through all the banks in exclusion of the other CPUs. 1084 * Check if this MCE is signaled to only this logical processor
1084 * This way we don't report duplicated events on shared banks
1085 * because the first one to see it will clear it.
1086 */ 1085 */
1087 order = mce_start(&no_way_out); 1086 if (m.mcgstatus & MCG_STATUS_LMCES)
1087 lmce = 1;
1088 else {
1089 /*
1090 * Go through all the banks in exclusion of the other CPUs.
1091 * This way we don't report duplicated events on shared banks
1092 * because the first one to see it will clear it.
1093 * If this is a Local MCE, then no need to perform rendezvous.
1094 */
1095 order = mce_start(&no_way_out);
1096 }
1097
1088 for (i = 0; i < cfg->banks; i++) { 1098 for (i = 0; i < cfg->banks; i++) {
1089 __clear_bit(i, toclear); 1099 __clear_bit(i, toclear);
1090 if (!test_bit(i, valid_banks)) 1100 if (!test_bit(i, valid_banks))
@@ -1161,8 +1171,18 @@ void do_machine_check(struct pt_regs *regs, long error_code)
1161 * Do most of the synchronization with other CPUs. 1171 * Do most of the synchronization with other CPUs.
1162 * When there's any problem use only local no_way_out state. 1172 * When there's any problem use only local no_way_out state.
1163 */ 1173 */
1164 if (mce_end(order) < 0) 1174 if (!lmce) {
1165 no_way_out = worst >= MCE_PANIC_SEVERITY; 1175 if (mce_end(order) < 0)
1176 no_way_out = worst >= MCE_PANIC_SEVERITY;
1177 } else {
1178 /*
1179 * Local MCE skipped calling mce_reign()
1180 * If we found a fatal error, we need to panic here.
1181 */
1182 if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
1183 mce_panic("Machine check from unknown source",
1184 NULL, NULL);
1185 }
1166 1186
1167 /* 1187 /*
1168 * At insane "tolerant" levels we take no action. Otherwise 1188 * At insane "tolerant" levels we take no action. Otherwise
@@ -1643,10 +1663,16 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
1643 mce_intel_feature_init(c); 1663 mce_intel_feature_init(c);
1644 mce_adjust_timer = cmci_intel_adjust_timer; 1664 mce_adjust_timer = cmci_intel_adjust_timer;
1645 break; 1665 break;
1646 case X86_VENDOR_AMD: 1666
1667 case X86_VENDOR_AMD: {
1668 u32 ebx = cpuid_ebx(0x80000007);
1669
1647 mce_amd_feature_init(c); 1670 mce_amd_feature_init(c);
1648 mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1; 1671 mce_flags.overflow_recov = !!(ebx & BIT(0));
1672 mce_flags.succor = !!(ebx & BIT(1));
1649 break; 1673 break;
1674 }
1675
1650 default: 1676 default:
1651 break; 1677 break;
1652 } 1678 }
@@ -1982,6 +2008,7 @@ void mce_disable_bank(int bank)
1982/* 2008/*
1983 * mce=off Disables machine check 2009 * mce=off Disables machine check
1984 * mce=no_cmci Disables CMCI 2010 * mce=no_cmci Disables CMCI
2011 * mce=no_lmce Disables LMCE
1985 * mce=dont_log_ce Clears corrected events silently, no log created for CEs. 2012 * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
1986 * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared. 2013 * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
1987 * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above) 2014 * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
@@ -2005,6 +2032,8 @@ static int __init mcheck_enable(char *str)
2005 cfg->disabled = true; 2032 cfg->disabled = true;
2006 else if (!strcmp(str, "no_cmci")) 2033 else if (!strcmp(str, "no_cmci"))
2007 cfg->cmci_disabled = true; 2034 cfg->cmci_disabled = true;
2035 else if (!strcmp(str, "no_lmce"))
2036 cfg->lmce_disabled = true;
2008 else if (!strcmp(str, "dont_log_ce")) 2037 else if (!strcmp(str, "dont_log_ce"))
2009 cfg->dont_log_ce = true; 2038 cfg->dont_log_ce = true;
2010 else if (!strcmp(str, "ignore_ce")) 2039 else if (!strcmp(str, "ignore_ce"))
@@ -2014,11 +2043,8 @@ static int __init mcheck_enable(char *str)
2014 else if (!strcmp(str, "bios_cmci_threshold")) 2043 else if (!strcmp(str, "bios_cmci_threshold"))
2015 cfg->bios_cmci_threshold = true; 2044 cfg->bios_cmci_threshold = true;
2016 else if (isdigit(str[0])) { 2045 else if (isdigit(str[0])) {
2017 get_option(&str, &(cfg->tolerant)); 2046 if (get_option(&str, &cfg->tolerant) == 2)
2018 if (*str == ',') {
2019 ++str;
2020 get_option(&str, &(cfg->monarch_timeout)); 2047 get_option(&str, &(cfg->monarch_timeout));
2021 }
2022 } else { 2048 } else {
2023 pr_info("mce argument %s ignored. Please use /sys\n", str); 2049 pr_info("mce argument %s ignored. Please use /sys\n", str);
2024 return 0; 2050 return 0;
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 55ad9b37cae8..e99b15077e94 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,19 +1,13 @@
1/* 1/*
2 * (c) 2005-2012 Advanced Micro Devices, Inc. 2 * (c) 2005-2015 Advanced Micro Devices, Inc.
3 * Your use of this code is subject to the terms and conditions of the 3 * Your use of this code is subject to the terms and conditions of the
4 * GNU general public license version 2. See "COPYING" or 4 * GNU general public license version 2. See "COPYING" or
5 * http://www.gnu.org/licenses/gpl.html 5 * http://www.gnu.org/licenses/gpl.html
6 * 6 *
7 * Written by Jacob Shin - AMD, Inc. 7 * Written by Jacob Shin - AMD, Inc.
8 *
9 * Maintained by: Borislav Petkov <bp@alien8.de> 8 * Maintained by: Borislav Petkov <bp@alien8.de>
10 * 9 *
11 * April 2006 10 * All MC4_MISCi registers are shared between cores on a node.
12 * - added support for AMD Family 0x10 processors
13 * May 2012
14 * - major scrubbing
15 *
16 * All MC4_MISCi registers are shared between multi-cores
17 */ 11 */
18#include <linux/interrupt.h> 12#include <linux/interrupt.h>
19#include <linux/notifier.h> 13#include <linux/notifier.h>
@@ -32,6 +26,7 @@
32#include <asm/idle.h> 26#include <asm/idle.h>
33#include <asm/mce.h> 27#include <asm/mce.h>
34#include <asm/msr.h> 28#include <asm/msr.h>
29#include <asm/trace/irq_vectors.h>
35 30
36#define NR_BLOCKS 9 31#define NR_BLOCKS 9
37#define THRESHOLD_MAX 0xFFF 32#define THRESHOLD_MAX 0xFFF
@@ -47,6 +42,13 @@
47#define MASK_BLKPTR_LO 0xFF000000 42#define MASK_BLKPTR_LO 0xFF000000
48#define MCG_XBLK_ADDR 0xC0000400 43#define MCG_XBLK_ADDR 0xC0000400
49 44
45/* Deferred error settings */
46#define MSR_CU_DEF_ERR 0xC0000410
47#define MASK_DEF_LVTOFF 0x000000F0
48#define MASK_DEF_INT_TYPE 0x00000006
49#define DEF_LVT_OFF 0x2
50#define DEF_INT_TYPE_APIC 0x2
51
50static const char * const th_names[] = { 52static const char * const th_names[] = {
51 "load_store", 53 "load_store",
52 "insn_fetch", 54 "insn_fetch",
@@ -60,6 +62,13 @@ static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
60static DEFINE_PER_CPU(unsigned char, bank_map); /* see which banks are on */ 62static DEFINE_PER_CPU(unsigned char, bank_map); /* see which banks are on */
61 63
62static void amd_threshold_interrupt(void); 64static void amd_threshold_interrupt(void);
65static void amd_deferred_error_interrupt(void);
66
67static void default_deferred_error_interrupt(void)
68{
69 pr_err("Unexpected deferred interrupt at vector %x\n", DEFERRED_ERROR_VECTOR);
70}
71void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;
63 72
64/* 73/*
65 * CPU Initialization 74 * CPU Initialization
@@ -196,7 +205,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
196 threshold_restart_bank(&tr); 205 threshold_restart_bank(&tr);
197}; 206};
198 207
199static int setup_APIC_mce(int reserved, int new) 208static int setup_APIC_mce_threshold(int reserved, int new)
200{ 209{
201 if (reserved < 0 && !setup_APIC_eilvt(new, THRESHOLD_APIC_VECTOR, 210 if (reserved < 0 && !setup_APIC_eilvt(new, THRESHOLD_APIC_VECTOR,
202 APIC_EILVT_MSG_FIX, 0)) 211 APIC_EILVT_MSG_FIX, 0))
@@ -205,6 +214,39 @@ static int setup_APIC_mce(int reserved, int new)
205 return reserved; 214 return reserved;
206} 215}
207 216
217static int setup_APIC_deferred_error(int reserved, int new)
218{
219 if (reserved < 0 && !setup_APIC_eilvt(new, DEFERRED_ERROR_VECTOR,
220 APIC_EILVT_MSG_FIX, 0))
221 return new;
222
223 return reserved;
224}
225
226static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
227{
228 u32 low = 0, high = 0;
229 int def_offset = -1, def_new;
230
231 if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
232 return;
233
234 def_new = (low & MASK_DEF_LVTOFF) >> 4;
235 if (!(low & MASK_DEF_LVTOFF)) {
236 pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
237 def_new = DEF_LVT_OFF;
238 low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
239 }
240
241 def_offset = setup_APIC_deferred_error(def_offset, def_new);
242 if ((def_offset == def_new) &&
243 (deferred_error_int_vector != amd_deferred_error_interrupt))
244 deferred_error_int_vector = amd_deferred_error_interrupt;
245
246 low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
247 wrmsr(MSR_CU_DEF_ERR, low, high);
248}
249
208/* cpu init entry point, called from mce.c with preempt off */ 250/* cpu init entry point, called from mce.c with preempt off */
209void mce_amd_feature_init(struct cpuinfo_x86 *c) 251void mce_amd_feature_init(struct cpuinfo_x86 *c)
210{ 252{
@@ -252,7 +294,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
252 294
253 b.interrupt_enable = 1; 295 b.interrupt_enable = 1;
254 new = (high & MASK_LVTOFF_HI) >> 20; 296 new = (high & MASK_LVTOFF_HI) >> 20;
255 offset = setup_APIC_mce(offset, new); 297 offset = setup_APIC_mce_threshold(offset, new);
256 298
257 if ((offset == new) && 299 if ((offset == new) &&
258 (mce_threshold_vector != amd_threshold_interrupt)) 300 (mce_threshold_vector != amd_threshold_interrupt))
@@ -262,6 +304,73 @@ init:
262 mce_threshold_block_init(&b, offset); 304 mce_threshold_block_init(&b, offset);
263 } 305 }
264 } 306 }
307
308 if (mce_flags.succor)
309 deferred_error_interrupt_enable(c);
310}
311
312static void __log_error(unsigned int bank, bool threshold_err, u64 misc)
313{
314 struct mce m;
315 u64 status;
316
317 rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
318 if (!(status & MCI_STATUS_VAL))
319 return;
320
321 mce_setup(&m);
322
323 m.status = status;
324 m.bank = bank;
325
326 if (threshold_err)
327 m.misc = misc;
328
329 if (m.status & MCI_STATUS_ADDRV)
330 rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
331
332 mce_log(&m);
333 wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
334}
335
336static inline void __smp_deferred_error_interrupt(void)
337{
338 inc_irq_stat(irq_deferred_error_count);
339 deferred_error_int_vector();
340}
341
342asmlinkage __visible void smp_deferred_error_interrupt(void)
343{
344 entering_irq();
345 __smp_deferred_error_interrupt();
346 exiting_ack_irq();
347}
348
349asmlinkage __visible void smp_trace_deferred_error_interrupt(void)
350{
351 entering_irq();
352 trace_deferred_error_apic_entry(DEFERRED_ERROR_VECTOR);
353 __smp_deferred_error_interrupt();
354 trace_deferred_error_apic_exit(DEFERRED_ERROR_VECTOR);
355 exiting_ack_irq();
356}
357
358/* APIC interrupt handler for deferred errors */
359static void amd_deferred_error_interrupt(void)
360{
361 u64 status;
362 unsigned int bank;
363
364 for (bank = 0; bank < mca_cfg.banks; ++bank) {
365 rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
366
367 if (!(status & MCI_STATUS_VAL) ||
368 !(status & MCI_STATUS_DEFERRED))
369 continue;
370
371 __log_error(bank, false, 0);
372 break;
373 }
265} 374}
266 375
267/* 376/*
@@ -273,12 +382,12 @@ init:
273 * the interrupt goes off when error_count reaches threshold_limit. 382 * the interrupt goes off when error_count reaches threshold_limit.
274 * the handler will simply log mcelog w/ software defined bank number. 383 * the handler will simply log mcelog w/ software defined bank number.
275 */ 384 */
385
276static void amd_threshold_interrupt(void) 386static void amd_threshold_interrupt(void)
277{ 387{
278 u32 low = 0, high = 0, address = 0; 388 u32 low = 0, high = 0, address = 0;
279 int cpu = smp_processor_id(); 389 int cpu = smp_processor_id();
280 unsigned int bank, block; 390 unsigned int bank, block;
281 struct mce m;
282 391
283 /* assume first bank caused it */ 392 /* assume first bank caused it */
284 for (bank = 0; bank < mca_cfg.banks; ++bank) { 393 for (bank = 0; bank < mca_cfg.banks; ++bank) {
@@ -321,15 +430,7 @@ static void amd_threshold_interrupt(void)
321 return; 430 return;
322 431
323log: 432log:
324 mce_setup(&m); 433 __log_error(bank, true, ((u64)high << 32) | low);
325 rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
326 if (!(m.status & MCI_STATUS_VAL))
327 return;
328 m.misc = ((u64)high << 32) | low;
329 m.bank = bank;
330 mce_log(&m);
331
332 wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
333} 434}
334 435
335/* 436/*
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index b4a41cf030ed..844f56c5616d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -91,6 +91,36 @@ static int cmci_supported(int *banks)
91 return !!(cap & MCG_CMCI_P); 91 return !!(cap & MCG_CMCI_P);
92} 92}
93 93
94static bool lmce_supported(void)
95{
96 u64 tmp;
97
98 if (mca_cfg.lmce_disabled)
99 return false;
100
101 rdmsrl(MSR_IA32_MCG_CAP, tmp);
102
103 /*
104 * LMCE depends on recovery support in the processor. Hence both
105 * MCG_SER_P and MCG_LMCE_P should be present in MCG_CAP.
106 */
107 if ((tmp & (MCG_SER_P | MCG_LMCE_P)) !=
108 (MCG_SER_P | MCG_LMCE_P))
109 return false;
110
111 /*
112 * BIOS should indicate support for LMCE by setting bit 20 in
113 * IA32_FEATURE_CONTROL without which touching MCG_EXT_CTL will
114 * generate a #GP fault.
115 */
116 rdmsrl(MSR_IA32_FEATURE_CONTROL, tmp);
117 if ((tmp & (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE)) ==
118 (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE))
119 return true;
120
121 return false;
122}
123
94bool mce_intel_cmci_poll(void) 124bool mce_intel_cmci_poll(void)
95{ 125{
96 if (__this_cpu_read(cmci_storm_state) == CMCI_STORM_NONE) 126 if (__this_cpu_read(cmci_storm_state) == CMCI_STORM_NONE)
@@ -405,8 +435,22 @@ static void intel_init_cmci(void)
405 cmci_recheck(); 435 cmci_recheck();
406} 436}
407 437
438void intel_init_lmce(void)
439{
440 u64 val;
441
442 if (!lmce_supported())
443 return;
444
445 rdmsrl(MSR_IA32_MCG_EXT_CTL, val);
446
447 if (!(val & MCG_EXT_CTL_LMCE_EN))
448 wrmsrl(MSR_IA32_MCG_EXT_CTL, val | MCG_EXT_CTL_LMCE_EN);
449}
450
408void mce_intel_feature_init(struct cpuinfo_x86 *c) 451void mce_intel_feature_init(struct cpuinfo_x86 *c)
409{ 452{
410 intel_init_thermal(c); 453 intel_init_thermal(c);
411 intel_init_cmci(); 454 intel_init_cmci();
455 intel_init_lmce();
412} 456}
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 939155ffdece..aad4bd84b475 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -39,14 +39,12 @@ void hyperv_vector_handler(struct pt_regs *regs)
39{ 39{
40 struct pt_regs *old_regs = set_irq_regs(regs); 40 struct pt_regs *old_regs = set_irq_regs(regs);
41 41
42 irq_enter(); 42 entering_irq();
43 exit_idle();
44
45 inc_irq_stat(irq_hv_callback_count); 43 inc_irq_stat(irq_hv_callback_count);
46 if (vmbus_handler) 44 if (vmbus_handler)
47 vmbus_handler(); 45 vmbus_handler();
48 46
49 irq_exit(); 47 exiting_irq();
50 set_irq_regs(old_regs); 48 set_irq_regs(old_regs);
51} 49}
52 50
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85ff22e..70d7c93f4550 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
98 continue; 98 continue;
99 base = range_state[i].base_pfn; 99 base = range_state[i].base_pfn;
100 if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed && 100 if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
101 (mtrr_state.enabled & 1)) { 101 (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
102 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
102 /* Var MTRR contains UC entry below 1M? Skip it: */ 103 /* Var MTRR contains UC entry below 1M? Skip it: */
103 printk(BIOS_BUG_MSG, i); 104 printk(BIOS_BUG_MSG, i);
104 if (base + size <= (1<<(20-PAGE_SHIFT))) 105 if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b3c6ba..3b533cf37c74 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,59 +102,76 @@ static int check_type_overlap(u8 *prev, u8 *curr)
102 return 0; 102 return 0;
103} 103}
104 104
105/* 105/**
106 * Error/Semi-error returns: 106 * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
107 * 0xFF - when MTRR is not enabled 107 *
108 * *repeat == 1 implies [start:end] spanned across MTRR range and type returned 108 * Return the MTRR fixed memory type of 'start'.
109 * corresponds only to [start:*partial_end]. 109 *
110 * Caller has to lookup again for [*partial_end:end]. 110 * MTRR fixed entries are divided into the following ways:
111 * 0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
112 * 0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
113 * 0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
114 *
115 * Return Values:
116 * MTRR_TYPE_(type) - Matched memory type
117 * MTRR_TYPE_INVALID - Unmatched
118 */
119static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
120{
121 int idx;
122
123 if (start >= 0x100000)
124 return MTRR_TYPE_INVALID;
125
126 /* 0x0 - 0x7FFFF */
127 if (start < 0x80000) {
128 idx = 0;
129 idx += (start >> 16);
130 return mtrr_state.fixed_ranges[idx];
131 /* 0x80000 - 0xBFFFF */
132 } else if (start < 0xC0000) {
133 idx = 1 * 8;
134 idx += ((start - 0x80000) >> 14);
135 return mtrr_state.fixed_ranges[idx];
136 }
137
138 /* 0xC0000 - 0xFFFFF */
139 idx = 3 * 8;
140 idx += ((start - 0xC0000) >> 12);
141 return mtrr_state.fixed_ranges[idx];
142}
143
144/**
145 * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
146 *
147 * Return Value:
148 * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
149 *
150 * Output Arguments:
151 * repeat - Set to 1 when [start:end] spanned across MTRR range and type
152 * returned corresponds only to [start:*partial_end]. Caller has
153 * to lookup again for [*partial_end:end].
154 *
155 * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
156 * region is fully covered by a single MTRR entry or the default
157 * type.
111 */ 158 */
112static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat) 159static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
160 int *repeat, u8 *uniform)
113{ 161{
114 int i; 162 int i;
115 u64 base, mask; 163 u64 base, mask;
116 u8 prev_match, curr_match; 164 u8 prev_match, curr_match;
117 165
118 *repeat = 0; 166 *repeat = 0;
119 if (!mtrr_state_set) 167 *uniform = 1;
120 return 0xFF;
121
122 if (!mtrr_state.enabled)
123 return 0xFF;
124 168
125 /* Make end inclusive end, instead of exclusive */ 169 /* Make end inclusive instead of exclusive */
126 end--; 170 end--;
127 171
128 /* Look in fixed ranges. Just return the type as per start */ 172 prev_match = MTRR_TYPE_INVALID;
129 if (mtrr_state.have_fixed && (start < 0x100000)) {
130 int idx;
131
132 if (start < 0x80000) {
133 idx = 0;
134 idx += (start >> 16);
135 return mtrr_state.fixed_ranges[idx];
136 } else if (start < 0xC0000) {
137 idx = 1 * 8;
138 idx += ((start - 0x80000) >> 14);
139 return mtrr_state.fixed_ranges[idx];
140 } else if (start < 0x1000000) {
141 idx = 3 * 8;
142 idx += ((start - 0xC0000) >> 12);
143 return mtrr_state.fixed_ranges[idx];
144 }
145 }
146
147 /*
148 * Look in variable ranges
149 * Look of multiple ranges matching this address and pick type
150 * as per MTRR precedence
151 */
152 if (!(mtrr_state.enabled & 2))
153 return mtrr_state.def_type;
154
155 prev_match = 0xFF;
156 for (i = 0; i < num_var_ranges; ++i) { 173 for (i = 0; i < num_var_ranges; ++i) {
157 unsigned short start_state, end_state; 174 unsigned short start_state, end_state, inclusive;
158 175
159 if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11))) 176 if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
160 continue; 177 continue;
@@ -166,20 +183,29 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
166 183
167 start_state = ((start & mask) == (base & mask)); 184 start_state = ((start & mask) == (base & mask));
168 end_state = ((end & mask) == (base & mask)); 185 end_state = ((end & mask) == (base & mask));
186 inclusive = ((start < base) && (end > base));
169 187
170 if (start_state != end_state) { 188 if ((start_state != end_state) || inclusive) {
171 /* 189 /*
172 * We have start:end spanning across an MTRR. 190 * We have start:end spanning across an MTRR.
173 * We split the region into 191 * We split the region into either
174 * either 192 *
175 * (start:mtrr_end) (mtrr_end:end) 193 * - start_state:1
176 * or 194 * (start:mtrr_end)(mtrr_end:end)
177 * (start:mtrr_start) (mtrr_start:end) 195 * - end_state:1
196 * (start:mtrr_start)(mtrr_start:end)
197 * - inclusive:1
198 * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
199 *
178 * depending on kind of overlap. 200 * depending on kind of overlap.
179 * Return the type for first region and a pointer to 201 *
180 * the start of second region so that caller will 202 * Return the type of the first region and a pointer
181 * lookup again on the second region. 203 * to the start of next region so that caller will be
182 * Note: This way we handle multiple overlaps as well. 204 * advised to lookup again after having adjusted start
205 * and end.
206 *
207 * Note: This way we handle overlaps with multiple
208 * entries and the default type properly.
183 */ 209 */
184 if (start_state) 210 if (start_state)
185 *partial_end = base + get_mtrr_size(mask); 211 *partial_end = base + get_mtrr_size(mask);
@@ -193,59 +219,94 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
193 219
194 end = *partial_end - 1; /* end is inclusive */ 220 end = *partial_end - 1; /* end is inclusive */
195 *repeat = 1; 221 *repeat = 1;
222 *uniform = 0;
196 } 223 }
197 224
198 if ((start & mask) != (base & mask)) 225 if ((start & mask) != (base & mask))
199 continue; 226 continue;
200 227
201 curr_match = mtrr_state.var_ranges[i].base_lo & 0xff; 228 curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
202 if (prev_match == 0xFF) { 229 if (prev_match == MTRR_TYPE_INVALID) {
203 prev_match = curr_match; 230 prev_match = curr_match;
204 continue; 231 continue;
205 } 232 }
206 233
234 *uniform = 0;
207 if (check_type_overlap(&prev_match, &curr_match)) 235 if (check_type_overlap(&prev_match, &curr_match))
208 return curr_match; 236 return curr_match;
209 } 237 }
210 238
211 if (mtrr_tom2) { 239 if (prev_match != MTRR_TYPE_INVALID)
212 if (start >= (1ULL<<32) && (end < mtrr_tom2))
213 return MTRR_TYPE_WRBACK;
214 }
215
216 if (prev_match != 0xFF)
217 return prev_match; 240 return prev_match;
218 241
219 return mtrr_state.def_type; 242 return mtrr_state.def_type;
220} 243}
221 244
222/* 245/**
223 * Returns the effective MTRR type for the region 246 * mtrr_type_lookup - look up memory type in MTRR
224 * Error return: 247 *
225 * 0xFF - when MTRR is not enabled 248 * Return Values:
249 * MTRR_TYPE_(type) - The effective MTRR type for the region
250 * MTRR_TYPE_INVALID - MTRR is disabled
251 *
252 * Output Argument:
253 * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
254 * region is fully covered by a single MTRR entry or the default
255 * type.
226 */ 256 */
227u8 mtrr_type_lookup(u64 start, u64 end) 257u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
228{ 258{
229 u8 type, prev_type; 259 u8 type, prev_type, is_uniform = 1, dummy;
230 int repeat; 260 int repeat;
231 u64 partial_end; 261 u64 partial_end;
232 262
233 type = __mtrr_type_lookup(start, end, &partial_end, &repeat); 263 if (!mtrr_state_set)
264 return MTRR_TYPE_INVALID;
265
266 if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
267 return MTRR_TYPE_INVALID;
268
269 /*
270 * Look up the fixed ranges first, which take priority over
271 * the variable ranges.
272 */
273 if ((start < 0x100000) &&
274 (mtrr_state.have_fixed) &&
275 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
276 is_uniform = 0;
277 type = mtrr_type_lookup_fixed(start, end);
278 goto out;
279 }
280
281 /*
282 * Look up the variable ranges. Look of multiple ranges matching
283 * this address and pick type as per MTRR precedence.
284 */
285 type = mtrr_type_lookup_variable(start, end, &partial_end,
286 &repeat, &is_uniform);
234 287
235 /* 288 /*
236 * Common path is with repeat = 0. 289 * Common path is with repeat = 0.
237 * However, we can have cases where [start:end] spans across some 290 * However, we can have cases where [start:end] spans across some
238 * MTRR range. Do repeated lookups for that case here. 291 * MTRR ranges and/or the default type. Do repeated lookups for
292 * that case here.
239 */ 293 */
240 while (repeat) { 294 while (repeat) {
241 prev_type = type; 295 prev_type = type;
242 start = partial_end; 296 start = partial_end;
243 type = __mtrr_type_lookup(start, end, &partial_end, &repeat); 297 is_uniform = 0;
298 type = mtrr_type_lookup_variable(start, end, &partial_end,
299 &repeat, &dummy);
244 300
245 if (check_type_overlap(&prev_type, &type)) 301 if (check_type_overlap(&prev_type, &type))
246 return type; 302 goto out;
247 } 303 }
248 304
305 if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
306 type = MTRR_TYPE_WRBACK;
307
308out:
309 *uniform = is_uniform;
249 return type; 310 return type;
250} 311}
251 312
@@ -347,7 +408,9 @@ static void __init print_mtrr_state(void)
347 mtrr_attrib_to_str(mtrr_state.def_type)); 408 mtrr_attrib_to_str(mtrr_state.def_type));
348 if (mtrr_state.have_fixed) { 409 if (mtrr_state.have_fixed) {
349 pr_debug("MTRR fixed ranges %sabled:\n", 410 pr_debug("MTRR fixed ranges %sabled:\n",
350 mtrr_state.enabled & 1 ? "en" : "dis"); 411 ((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
412 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
413 "en" : "dis");
351 print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0); 414 print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
352 for (i = 0; i < 2; ++i) 415 for (i = 0; i < 2; ++i)
353 print_fixed(0x80000 + i * 0x20000, 0x04000, 416 print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -360,7 +423,7 @@ static void __init print_mtrr_state(void)
360 print_fixed_last(); 423 print_fixed_last();
361 } 424 }
362 pr_debug("MTRR variable ranges %sabled:\n", 425 pr_debug("MTRR variable ranges %sabled:\n",
363 mtrr_state.enabled & 2 ? "en" : "dis"); 426 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
364 high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4; 427 high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
365 428
366 for (i = 0; i < num_var_ranges; ++i) { 429 for (i = 0; i < num_var_ranges; ++i) {
@@ -382,7 +445,7 @@ static void __init print_mtrr_state(void)
382} 445}
383 446
384/* Grab all of the MTRR state for this CPU into *state */ 447/* Grab all of the MTRR state for this CPU into *state */
385void __init get_mtrr_state(void) 448bool __init get_mtrr_state(void)
386{ 449{
387 struct mtrr_var_range *vrs; 450 struct mtrr_var_range *vrs;
388 unsigned long flags; 451 unsigned long flags;
@@ -426,6 +489,8 @@ void __init get_mtrr_state(void)
426 489
427 post_set(); 490 post_set();
428 local_irq_restore(flags); 491 local_irq_restore(flags);
492
493 return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
429} 494}
430 495
431/* Some BIOS's are messed up and don't set all MTRRs the same! */ 496/* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363a1948..e7ed0d8ebacb 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,12 @@
59#define MTRR_TO_PHYS_WC_OFFSET 1000 59#define MTRR_TO_PHYS_WC_OFFSET 1000
60 60
61u32 num_var_ranges; 61u32 num_var_ranges;
62static bool __mtrr_enabled;
63
64static bool mtrr_enabled(void)
65{
66 return __mtrr_enabled;
67}
62 68
63unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; 69unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
64static DEFINE_MUTEX(mtrr_mutex); 70static DEFINE_MUTEX(mtrr_mutex);
@@ -286,7 +292,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
286 int i, replace, error; 292 int i, replace, error;
287 mtrr_type ltype; 293 mtrr_type ltype;
288 294
289 if (!mtrr_if) 295 if (!mtrr_enabled())
290 return -ENXIO; 296 return -ENXIO;
291 297
292 error = mtrr_if->validate_add_page(base, size, type); 298 error = mtrr_if->validate_add_page(base, size, type);
@@ -435,6 +441,8 @@ static int mtrr_check(unsigned long base, unsigned long size)
435int mtrr_add(unsigned long base, unsigned long size, unsigned int type, 441int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
436 bool increment) 442 bool increment)
437{ 443{
444 if (!mtrr_enabled())
445 return -ENODEV;
438 if (mtrr_check(base, size)) 446 if (mtrr_check(base, size))
439 return -EINVAL; 447 return -EINVAL;
440 return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type, 448 return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
@@ -463,8 +471,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
463 unsigned long lbase, lsize; 471 unsigned long lbase, lsize;
464 int error = -EINVAL; 472 int error = -EINVAL;
465 473
466 if (!mtrr_if) 474 if (!mtrr_enabled())
467 return -ENXIO; 475 return -ENODEV;
468 476
469 max = num_var_ranges; 477 max = num_var_ranges;
470 /* No CPU hotplug when we change MTRR entries */ 478 /* No CPU hotplug when we change MTRR entries */
@@ -523,6 +531,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
523 */ 531 */
524int mtrr_del(int reg, unsigned long base, unsigned long size) 532int mtrr_del(int reg, unsigned long base, unsigned long size)
525{ 533{
534 if (!mtrr_enabled())
535 return -ENODEV;
526 if (mtrr_check(base, size)) 536 if (mtrr_check(base, size))
527 return -EINVAL; 537 return -EINVAL;
528 return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT); 538 return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -538,6 +548,9 @@ EXPORT_SYMBOL(mtrr_del);
538 * attempts to add a WC MTRR covering size bytes starting at base and 548 * attempts to add a WC MTRR covering size bytes starting at base and
539 * logs an error if this fails. 549 * logs an error if this fails.
540 * 550 *
551 * The called should provide a power of two size on an equivalent
552 * power of two boundary.
553 *
541 * Drivers must store the return value to pass to mtrr_del_wc_if_needed, 554 * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
542 * but drivers should not try to interpret that return value. 555 * but drivers should not try to interpret that return value.
543 */ 556 */
@@ -545,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
545{ 558{
546 int ret; 559 int ret;
547 560
548 if (pat_enabled) 561 if (pat_enabled() || !mtrr_enabled())
549 return 0; /* Success! (We don't need to do anything.) */ 562 return 0; /* Success! (We don't need to do anything.) */
550 563
551 ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true); 564 ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -577,7 +590,7 @@ void arch_phys_wc_del(int handle)
577EXPORT_SYMBOL(arch_phys_wc_del); 590EXPORT_SYMBOL(arch_phys_wc_del);
578 591
579/* 592/*
580 * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value 593 * arch_phys_wc_index - translates arch_phys_wc_add's return value
581 * @handle: Return value from arch_phys_wc_add 594 * @handle: Return value from arch_phys_wc_add
582 * 595 *
583 * This will turn the return value from arch_phys_wc_add into an mtrr 596 * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -587,14 +600,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
587 * in printk line. Alas there is an illegitimate use in some ancient 600 * in printk line. Alas there is an illegitimate use in some ancient
588 * drm ioctls. 601 * drm ioctls.
589 */ 602 */
590int phys_wc_to_mtrr_index(int handle) 603int arch_phys_wc_index(int handle)
591{ 604{
592 if (handle < MTRR_TO_PHYS_WC_OFFSET) 605 if (handle < MTRR_TO_PHYS_WC_OFFSET)
593 return -1; 606 return -1;
594 else 607 else
595 return handle - MTRR_TO_PHYS_WC_OFFSET; 608 return handle - MTRR_TO_PHYS_WC_OFFSET;
596} 609}
597EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index); 610EXPORT_SYMBOL_GPL(arch_phys_wc_index);
598 611
599/* 612/*
600 * HACK ALERT! 613 * HACK ALERT!
@@ -734,10 +747,12 @@ void __init mtrr_bp_init(void)
734 } 747 }
735 748
736 if (mtrr_if) { 749 if (mtrr_if) {
750 __mtrr_enabled = true;
737 set_num_var_ranges(); 751 set_num_var_ranges();
738 init_table(); 752 init_table();
739 if (use_intel()) { 753 if (use_intel()) {
740 get_mtrr_state(); 754 /* BIOS may override */
755 __mtrr_enabled = get_mtrr_state();
741 756
742 if (mtrr_cleanup(phys_addr)) { 757 if (mtrr_cleanup(phys_addr)) {
743 changed_by_mtrr_cleanup = 1; 758 changed_by_mtrr_cleanup = 1;
@@ -745,10 +760,16 @@ void __init mtrr_bp_init(void)
745 } 760 }
746 } 761 }
747 } 762 }
763
764 if (!mtrr_enabled())
765 pr_info("MTRR: Disabled\n");
748} 766}
749 767
750void mtrr_ap_init(void) 768void mtrr_ap_init(void)
751{ 769{
770 if (!mtrr_enabled())
771 return;
772
752 if (!use_intel() || mtrr_aps_delayed_init) 773 if (!use_intel() || mtrr_aps_delayed_init)
753 return; 774 return;
754 /* 775 /*
@@ -774,6 +795,9 @@ void mtrr_save_state(void)
774{ 795{
775 int first_cpu; 796 int first_cpu;
776 797
798 if (!mtrr_enabled())
799 return;
800
777 get_online_cpus(); 801 get_online_cpus();
778 first_cpu = cpumask_first(cpu_online_mask); 802 first_cpu = cpumask_first(cpu_online_mask);
779 smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); 803 smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -782,6 +806,8 @@ void mtrr_save_state(void)
782 806
783void set_mtrr_aps_delayed_init(void) 807void set_mtrr_aps_delayed_init(void)
784{ 808{
809 if (!mtrr_enabled())
810 return;
785 if (!use_intel()) 811 if (!use_intel())
786 return; 812 return;
787 813
@@ -793,7 +819,7 @@ void set_mtrr_aps_delayed_init(void)
793 */ 819 */
794void mtrr_aps_init(void) 820void mtrr_aps_init(void)
795{ 821{
796 if (!use_intel()) 822 if (!use_intel() || !mtrr_enabled())
797 return; 823 return;
798 824
799 /* 825 /*
@@ -810,7 +836,7 @@ void mtrr_aps_init(void)
810 836
811void mtrr_bp_restore(void) 837void mtrr_bp_restore(void)
812{ 838{
813 if (!use_intel()) 839 if (!use_intel() || !mtrr_enabled())
814 return; 840 return;
815 841
816 mtrr_if->set_all(); 842 mtrr_if->set_all();
@@ -818,7 +844,7 @@ void mtrr_bp_restore(void)
818 844
819static int __init mtrr_init_finialize(void) 845static int __init mtrr_init_finialize(void)
820{ 846{
821 if (!mtrr_if) 847 if (!mtrr_enabled())
822 return 0; 848 return 0;
823 849
824 if (use_intel()) { 850 if (use_intel()) {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f31a27..951884dcc433 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
51 51
52void fill_mtrr_var_range(unsigned int index, 52void fill_mtrr_var_range(unsigned int index,
53 u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi); 53 u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
54void get_mtrr_state(void); 54bool get_mtrr_state(void);
55 55
56extern void set_mtrr_ops(const struct mtrr_ops *ops); 56extern void set_mtrr_ops(const struct mtrr_ops *ops);
57 57
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c76d3e37c6e1..e068d6683dba 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -22,6 +22,7 @@
22#include <linux/elfcore.h> 22#include <linux/elfcore.h>
23#include <linux/module.h> 23#include <linux/module.h>
24#include <linux/slab.h> 24#include <linux/slab.h>
25#include <linux/vmalloc.h>
25 26
26#include <asm/processor.h> 27#include <asm/processor.h>
27#include <asm/hardirq.h> 28#include <asm/hardirq.h>
diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c
index 6367a780cc8c..5ee771859b6f 100644
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -4,7 +4,6 @@
4#include <linux/bootmem.h> 4#include <linux/bootmem.h>
5#include <linux/export.h> 5#include <linux/export.h>
6#include <linux/io.h> 6#include <linux/io.h>
7#include <linux/irqdomain.h>
8#include <linux/interrupt.h> 7#include <linux/interrupt.h>
9#include <linux/list.h> 8#include <linux/list.h>
10#include <linux/of.h> 9#include <linux/of.h>
@@ -17,6 +16,7 @@
17#include <linux/of_pci.h> 16#include <linux/of_pci.h>
18#include <linux/initrd.h> 17#include <linux/initrd.h>
19 18
19#include <asm/irqdomain.h>
20#include <asm/hpet.h> 20#include <asm/hpet.h>
21#include <asm/apic.h> 21#include <asm/apic.h>
22#include <asm/pci_x86.h> 22#include <asm/pci_x86.h>
@@ -196,38 +196,31 @@ static struct of_ioapic_type of_ioapic_type[] =
196 }, 196 },
197}; 197};
198 198
199static int ioapic_xlate(struct irq_domain *domain, 199static int dt_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
200 struct device_node *controller, 200 unsigned int nr_irqs, void *arg)
201 const u32 *intspec, u32 intsize,
202 irq_hw_number_t *out_hwirq, u32 *out_type)
203{ 201{
202 struct of_phandle_args *irq_data = (void *)arg;
204 struct of_ioapic_type *it; 203 struct of_ioapic_type *it;
205 u32 line, idx, gsi; 204 struct irq_alloc_info tmp;
206 205
207 if (WARN_ON(intsize < 2)) 206 if (WARN_ON(irq_data->args_count < 2))
208 return -EINVAL; 207 return -EINVAL;
209 208 if (irq_data->args[1] >= ARRAY_SIZE(of_ioapic_type))
210 line = intspec[0];
211
212 if (intspec[1] >= ARRAY_SIZE(of_ioapic_type))
213 return -EINVAL; 209 return -EINVAL;
214 210
215 it = &of_ioapic_type[intspec[1]]; 211 it = &of_ioapic_type[irq_data->args[1]];
212 ioapic_set_alloc_attr(&tmp, NUMA_NO_NODE, it->trigger, it->polarity);
213 tmp.ioapic_id = mpc_ioapic_id(mp_irqdomain_ioapic_idx(domain));
214 tmp.ioapic_pin = irq_data->args[0];
216 215
217 idx = (u32)(long)domain->host_data; 216 return mp_irqdomain_alloc(domain, virq, nr_irqs, &tmp);
218 gsi = mp_pin_to_gsi(idx, line);
219 if (mp_set_gsi_attr(gsi, it->trigger, it->polarity, cpu_to_node(0)))
220 return -EBUSY;
221
222 *out_hwirq = line;
223 *out_type = it->out_type;
224 return 0;
225} 217}
226 218
227const struct irq_domain_ops ioapic_irq_domain_ops = { 219static const struct irq_domain_ops ioapic_irq_domain_ops = {
228 .map = mp_irqdomain_map, 220 .alloc = dt_irqdomain_alloc,
229 .unmap = mp_irqdomain_unmap, 221 .free = mp_irqdomain_free,
230 .xlate = ioapic_xlate, 222 .activate = mp_irqdomain_activate,
223 .deactivate = mp_irqdomain_deactivate,
231}; 224};
232 225
233static void __init dtb_add_ioapic(struct device_node *dn) 226static void __init dtb_add_ioapic(struct device_node *dn)
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index fe9f0b79a18b..5cb9a4d6f623 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -627,8 +627,12 @@ static struct chipset early_qrk[] __initdata = {
627 { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA, PCI_ANY_ID, 627 { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA, PCI_ANY_ID,
628 QFLAG_APPLY_ONCE, intel_graphics_stolen }, 628 QFLAG_APPLY_ONCE, intel_graphics_stolen },
629 /* 629 /*
630 * HPET on current version of Baytrail platform has accuracy 630 * HPET on the current version of the Baytrail platform has accuracy
631 * problems, disable it for now: 631 * problems: it will halt in deep idle state - so we disable it.
632 *
633 * More details can be found in section 18.10.1.3 of the datasheet:
634 *
635 * http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/atom-z8000-datasheet-vol-1.pdf
632 */ 636 */
633 { PCI_VENDOR_ID_INTEL, 0x0f00, 637 { PCI_VENDOR_ID_INTEL, 0x0f00,
634 PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet}, 638 PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
deleted file mode 100644
index 1c309763e321..000000000000
--- a/arch/x86/kernel/entry_32.S
+++ /dev/null
@@ -1,1401 +0,0 @@
1/*
2 *
3 * Copyright (C) 1991, 1992 Linus Torvalds
4 */
5
6/*
7 * entry.S contains the system-call and fault low-level handling routines.
8 * This also contains the timer-interrupt handler, as well as all interrupts
9 * and faults that can result in a task-switch.
10 *
11 * NOTE: This code handles signal-recognition, which happens every time
12 * after a timer-interrupt and after each system call.
13 *
14 * I changed all the .align's to 4 (16 byte alignment), as that's faster
15 * on a 486.
16 *
17 * Stack layout in 'syscall_exit':
18 * ptrace needs to have all regs on the stack.
19 * if the order here is changed, it needs to be
20 * updated in fork.c:copy_process, signal.c:do_signal,
21 * ptrace.c and ptrace.h
22 *
23 * 0(%esp) - %ebx
24 * 4(%esp) - %ecx
25 * 8(%esp) - %edx
26 * C(%esp) - %esi
27 * 10(%esp) - %edi
28 * 14(%esp) - %ebp
29 * 18(%esp) - %eax
30 * 1C(%esp) - %ds
31 * 20(%esp) - %es
32 * 24(%esp) - %fs
33 * 28(%esp) - %gs saved iff !CONFIG_X86_32_LAZY_GS
34 * 2C(%esp) - orig_eax
35 * 30(%esp) - %eip
36 * 34(%esp) - %cs
37 * 38(%esp) - %eflags
38 * 3C(%esp) - %oldesp
39 * 40(%esp) - %oldss
40 *
41 * "current" is in register %ebx during any slow entries.
42 */
43
44#include <linux/linkage.h>
45#include <linux/err.h>
46#include <asm/thread_info.h>
47#include <asm/irqflags.h>
48#include <asm/errno.h>
49#include <asm/segment.h>
50#include <asm/smp.h>
51#include <asm/page_types.h>
52#include <asm/percpu.h>
53#include <asm/dwarf2.h>
54#include <asm/processor-flags.h>
55#include <asm/ftrace.h>
56#include <asm/irq_vectors.h>
57#include <asm/cpufeature.h>
58#include <asm/alternative-asm.h>
59#include <asm/asm.h>
60#include <asm/smap.h>
61
62/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
63#include <linux/elf-em.h>
64#define AUDIT_ARCH_I386 (EM_386|__AUDIT_ARCH_LE)
65#define __AUDIT_ARCH_LE 0x40000000
66
67#ifndef CONFIG_AUDITSYSCALL
68#define sysenter_audit syscall_trace_entry
69#define sysexit_audit syscall_exit_work
70#endif
71
72 .section .entry.text, "ax"
73
74/*
75 * We use macros for low-level operations which need to be overridden
76 * for paravirtualization. The following will never clobber any registers:
77 * INTERRUPT_RETURN (aka. "iret")
78 * GET_CR0_INTO_EAX (aka. "movl %cr0, %eax")
79 * ENABLE_INTERRUPTS_SYSEXIT (aka "sti; sysexit").
80 *
81 * For DISABLE_INTERRUPTS/ENABLE_INTERRUPTS (aka "cli"/"sti"), you must
82 * specify what registers can be overwritten (CLBR_NONE, CLBR_EAX/EDX/ECX/ANY).
83 * Allowing a register to be clobbered can shrink the paravirt replacement
84 * enough to patch inline, increasing performance.
85 */
86
87#ifdef CONFIG_PREEMPT
88#define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
89#else
90#define preempt_stop(clobbers)
91#define resume_kernel restore_all
92#endif
93
94.macro TRACE_IRQS_IRET
95#ifdef CONFIG_TRACE_IRQFLAGS
96 testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off?
97 jz 1f
98 TRACE_IRQS_ON
991:
100#endif
101.endm
102
103/*
104 * User gs save/restore
105 *
106 * %gs is used for userland TLS and kernel only uses it for stack
107 * canary which is required to be at %gs:20 by gcc. Read the comment
108 * at the top of stackprotector.h for more info.
109 *
110 * Local labels 98 and 99 are used.
111 */
112#ifdef CONFIG_X86_32_LAZY_GS
113
114 /* unfortunately push/pop can't be no-op */
115.macro PUSH_GS
116 pushl_cfi $0
117.endm
118.macro POP_GS pop=0
119 addl $(4 + \pop), %esp
120 CFI_ADJUST_CFA_OFFSET -(4 + \pop)
121.endm
122.macro POP_GS_EX
123.endm
124
125 /* all the rest are no-op */
126.macro PTGS_TO_GS
127.endm
128.macro PTGS_TO_GS_EX
129.endm
130.macro GS_TO_REG reg
131.endm
132.macro REG_TO_PTGS reg
133.endm
134.macro SET_KERNEL_GS reg
135.endm
136
137#else /* CONFIG_X86_32_LAZY_GS */
138
139.macro PUSH_GS
140 pushl_cfi %gs
141 /*CFI_REL_OFFSET gs, 0*/
142.endm
143
144.macro POP_GS pop=0
14598: popl_cfi %gs
146 /*CFI_RESTORE gs*/
147 .if \pop <> 0
148 add $\pop, %esp
149 CFI_ADJUST_CFA_OFFSET -\pop
150 .endif
151.endm
152.macro POP_GS_EX
153.pushsection .fixup, "ax"
15499: movl $0, (%esp)
155 jmp 98b
156.popsection
157 _ASM_EXTABLE(98b,99b)
158.endm
159
160.macro PTGS_TO_GS
16198: mov PT_GS(%esp), %gs
162.endm
163.macro PTGS_TO_GS_EX
164.pushsection .fixup, "ax"
16599: movl $0, PT_GS(%esp)
166 jmp 98b
167.popsection
168 _ASM_EXTABLE(98b,99b)
169.endm
170
171.macro GS_TO_REG reg
172 movl %gs, \reg
173 /*CFI_REGISTER gs, \reg*/
174.endm
175.macro REG_TO_PTGS reg
176 movl \reg, PT_GS(%esp)
177 /*CFI_REL_OFFSET gs, PT_GS*/
178.endm
179.macro SET_KERNEL_GS reg
180 movl $(__KERNEL_STACK_CANARY), \reg
181 movl \reg, %gs
182.endm
183
184#endif /* CONFIG_X86_32_LAZY_GS */
185
186.macro SAVE_ALL
187 cld
188 PUSH_GS
189 pushl_cfi %fs
190 /*CFI_REL_OFFSET fs, 0;*/
191 pushl_cfi %es
192 /*CFI_REL_OFFSET es, 0;*/
193 pushl_cfi %ds
194 /*CFI_REL_OFFSET ds, 0;*/
195 pushl_cfi %eax
196 CFI_REL_OFFSET eax, 0
197 pushl_cfi %ebp
198 CFI_REL_OFFSET ebp, 0
199 pushl_cfi %edi
200 CFI_REL_OFFSET edi, 0
201 pushl_cfi %esi
202 CFI_REL_OFFSET esi, 0
203 pushl_cfi %edx
204 CFI_REL_OFFSET edx, 0
205 pushl_cfi %ecx
206 CFI_REL_OFFSET ecx, 0
207 pushl_cfi %ebx
208 CFI_REL_OFFSET ebx, 0
209 movl $(__USER_DS), %edx
210 movl %edx, %ds
211 movl %edx, %es
212 movl $(__KERNEL_PERCPU), %edx
213 movl %edx, %fs
214 SET_KERNEL_GS %edx
215.endm
216
217.macro RESTORE_INT_REGS
218 popl_cfi %ebx
219 CFI_RESTORE ebx
220 popl_cfi %ecx
221 CFI_RESTORE ecx
222 popl_cfi %edx
223 CFI_RESTORE edx
224 popl_cfi %esi
225 CFI_RESTORE esi
226 popl_cfi %edi
227 CFI_RESTORE edi
228 popl_cfi %ebp
229 CFI_RESTORE ebp
230 popl_cfi %eax
231 CFI_RESTORE eax
232.endm
233
234.macro RESTORE_REGS pop=0
235 RESTORE_INT_REGS
2361: popl_cfi %ds
237 /*CFI_RESTORE ds;*/
2382: popl_cfi %es
239 /*CFI_RESTORE es;*/
2403: popl_cfi %fs
241 /*CFI_RESTORE fs;*/
242 POP_GS \pop
243.pushsection .fixup, "ax"
2444: movl $0, (%esp)
245 jmp 1b
2465: movl $0, (%esp)
247 jmp 2b
2486: movl $0, (%esp)
249 jmp 3b
250.popsection
251 _ASM_EXTABLE(1b,4b)
252 _ASM_EXTABLE(2b,5b)
253 _ASM_EXTABLE(3b,6b)
254 POP_GS_EX
255.endm
256
257.macro RING0_INT_FRAME
258 CFI_STARTPROC simple
259 CFI_SIGNAL_FRAME
260 CFI_DEF_CFA esp, 3*4
261 /*CFI_OFFSET cs, -2*4;*/
262 CFI_OFFSET eip, -3*4
263.endm
264
265.macro RING0_EC_FRAME
266 CFI_STARTPROC simple
267 CFI_SIGNAL_FRAME
268 CFI_DEF_CFA esp, 4*4
269 /*CFI_OFFSET cs, -2*4;*/
270 CFI_OFFSET eip, -3*4
271.endm
272
273.macro RING0_PTREGS_FRAME
274 CFI_STARTPROC simple
275 CFI_SIGNAL_FRAME
276 CFI_DEF_CFA esp, PT_OLDESP-PT_EBX
277 /*CFI_OFFSET cs, PT_CS-PT_OLDESP;*/
278 CFI_OFFSET eip, PT_EIP-PT_OLDESP
279 /*CFI_OFFSET es, PT_ES-PT_OLDESP;*/
280 /*CFI_OFFSET ds, PT_DS-PT_OLDESP;*/
281 CFI_OFFSET eax, PT_EAX-PT_OLDESP
282 CFI_OFFSET ebp, PT_EBP-PT_OLDESP
283 CFI_OFFSET edi, PT_EDI-PT_OLDESP
284 CFI_OFFSET esi, PT_ESI-PT_OLDESP
285 CFI_OFFSET edx, PT_EDX-PT_OLDESP
286 CFI_OFFSET ecx, PT_ECX-PT_OLDESP
287 CFI_OFFSET ebx, PT_EBX-PT_OLDESP
288.endm
289
290ENTRY(ret_from_fork)
291 CFI_STARTPROC
292 pushl_cfi %eax
293 call schedule_tail
294 GET_THREAD_INFO(%ebp)
295 popl_cfi %eax
296 pushl_cfi $0x0202 # Reset kernel eflags
297 popfl_cfi
298 jmp syscall_exit
299 CFI_ENDPROC
300END(ret_from_fork)
301
302ENTRY(ret_from_kernel_thread)
303 CFI_STARTPROC
304 pushl_cfi %eax
305 call schedule_tail
306 GET_THREAD_INFO(%ebp)
307 popl_cfi %eax
308 pushl_cfi $0x0202 # Reset kernel eflags
309 popfl_cfi
310 movl PT_EBP(%esp),%eax
311 call *PT_EBX(%esp)
312 movl $0,PT_EAX(%esp)
313 jmp syscall_exit
314 CFI_ENDPROC
315ENDPROC(ret_from_kernel_thread)
316
317/*
318 * Return to user mode is not as complex as all this looks,
319 * but we want the default path for a system call return to
320 * go as quickly as possible which is why some of this is
321 * less clear than it otherwise should be.
322 */
323
324 # userspace resumption stub bypassing syscall exit tracing
325 ALIGN
326 RING0_PTREGS_FRAME
327ret_from_exception:
328 preempt_stop(CLBR_ANY)
329ret_from_intr:
330 GET_THREAD_INFO(%ebp)
331#ifdef CONFIG_VM86
332 movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
333 movb PT_CS(%esp), %al
334 andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
335#else
336 /*
337 * We can be coming here from child spawned by kernel_thread().
338 */
339 movl PT_CS(%esp), %eax
340 andl $SEGMENT_RPL_MASK, %eax
341#endif
342 cmpl $USER_RPL, %eax
343 jb resume_kernel # not returning to v8086 or userspace
344
345ENTRY(resume_userspace)
346 LOCKDEP_SYS_EXIT
347 DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
348 # setting need_resched or sigpending
349 # between sampling and the iret
350 TRACE_IRQS_OFF
351 movl TI_flags(%ebp), %ecx
352 andl $_TIF_WORK_MASK, %ecx # is there any work to be done on
353 # int/exception return?
354 jne work_pending
355 jmp restore_all
356END(ret_from_exception)
357
358#ifdef CONFIG_PREEMPT
359ENTRY(resume_kernel)
360 DISABLE_INTERRUPTS(CLBR_ANY)
361need_resched:
362 cmpl $0,PER_CPU_VAR(__preempt_count)
363 jnz restore_all
364 testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path) ?
365 jz restore_all
366 call preempt_schedule_irq
367 jmp need_resched
368END(resume_kernel)
369#endif
370 CFI_ENDPROC
371
372/* SYSENTER_RETURN points to after the "sysenter" instruction in
373 the vsyscall page. See vsyscall-sysentry.S, which defines the symbol. */
374
375 # sysenter call handler stub
376ENTRY(ia32_sysenter_target)
377 CFI_STARTPROC simple
378 CFI_SIGNAL_FRAME
379 CFI_DEF_CFA esp, 0
380 CFI_REGISTER esp, ebp
381 movl TSS_sysenter_sp0(%esp),%esp
382sysenter_past_esp:
383 /*
384 * Interrupts are disabled here, but we can't trace it until
385 * enough kernel state to call TRACE_IRQS_OFF can be called - but
386 * we immediately enable interrupts at that point anyway.
387 */
388 pushl_cfi $__USER_DS
389 /*CFI_REL_OFFSET ss, 0*/
390 pushl_cfi %ebp
391 CFI_REL_OFFSET esp, 0
392 pushfl_cfi
393 orl $X86_EFLAGS_IF, (%esp)
394 pushl_cfi $__USER_CS
395 /*CFI_REL_OFFSET cs, 0*/
396 /*
397 * Push current_thread_info()->sysenter_return to the stack.
398 * A tiny bit of offset fixup is necessary: TI_sysenter_return
399 * is relative to thread_info, which is at the bottom of the
400 * kernel stack page. 4*4 means the 4 words pushed above;
401 * TOP_OF_KERNEL_STACK_PADDING takes us to the top of the stack;
402 * and THREAD_SIZE takes us to the bottom.
403 */
404 pushl_cfi ((TI_sysenter_return) - THREAD_SIZE + TOP_OF_KERNEL_STACK_PADDING + 4*4)(%esp)
405 CFI_REL_OFFSET eip, 0
406
407 pushl_cfi %eax
408 SAVE_ALL
409 ENABLE_INTERRUPTS(CLBR_NONE)
410
411/*
412 * Load the potential sixth argument from user stack.
413 * Careful about security.
414 */
415 cmpl $__PAGE_OFFSET-3,%ebp
416 jae syscall_fault
417 ASM_STAC
4181: movl (%ebp),%ebp
419 ASM_CLAC
420 movl %ebp,PT_EBP(%esp)
421 _ASM_EXTABLE(1b,syscall_fault)
422
423 GET_THREAD_INFO(%ebp)
424
425 testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
426 jnz sysenter_audit
427sysenter_do_call:
428 cmpl $(NR_syscalls), %eax
429 jae sysenter_badsys
430 call *sys_call_table(,%eax,4)
431sysenter_after_call:
432 movl %eax,PT_EAX(%esp)
433 LOCKDEP_SYS_EXIT
434 DISABLE_INTERRUPTS(CLBR_ANY)
435 TRACE_IRQS_OFF
436 movl TI_flags(%ebp), %ecx
437 testl $_TIF_ALLWORK_MASK, %ecx
438 jnz sysexit_audit
439sysenter_exit:
440/* if something modifies registers it must also disable sysexit */
441 movl PT_EIP(%esp), %edx
442 movl PT_OLDESP(%esp), %ecx
443 xorl %ebp,%ebp
444 TRACE_IRQS_ON
4451: mov PT_FS(%esp), %fs
446 PTGS_TO_GS
447 ENABLE_INTERRUPTS_SYSEXIT
448
449#ifdef CONFIG_AUDITSYSCALL
450sysenter_audit:
451 testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
452 jnz syscall_trace_entry
453 /* movl PT_EAX(%esp), %eax already set, syscall number: 1st arg to audit */
454 movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
455 /* movl PT_ECX(%esp), %ecx already set, a1: 3nd arg to audit */
456 pushl_cfi PT_ESI(%esp) /* a3: 5th arg */
457 pushl_cfi PT_EDX+4(%esp) /* a2: 4th arg */
458 call __audit_syscall_entry
459 popl_cfi %ecx /* get that remapped edx off the stack */
460 popl_cfi %ecx /* get that remapped esi off the stack */
461 movl PT_EAX(%esp),%eax /* reload syscall number */
462 jmp sysenter_do_call
463
464sysexit_audit:
465 testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
466 jnz syscall_exit_work
467 TRACE_IRQS_ON
468 ENABLE_INTERRUPTS(CLBR_ANY)
469 movl %eax,%edx /* second arg, syscall return value */
470 cmpl $-MAX_ERRNO,%eax /* is it an error ? */
471 setbe %al /* 1 if so, 0 if not */
472 movzbl %al,%eax /* zero-extend that */
473 call __audit_syscall_exit
474 DISABLE_INTERRUPTS(CLBR_ANY)
475 TRACE_IRQS_OFF
476 movl TI_flags(%ebp), %ecx
477 testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
478 jnz syscall_exit_work
479 movl PT_EAX(%esp),%eax /* reload syscall return value */
480 jmp sysenter_exit
481#endif
482
483 CFI_ENDPROC
484.pushsection .fixup,"ax"
4852: movl $0,PT_FS(%esp)
486 jmp 1b
487.popsection
488 _ASM_EXTABLE(1b,2b)
489 PTGS_TO_GS_EX
490ENDPROC(ia32_sysenter_target)
491
492 # system call handler stub
493ENTRY(system_call)
494 RING0_INT_FRAME # can't unwind into user space anyway
495 ASM_CLAC
496 pushl_cfi %eax # save orig_eax
497 SAVE_ALL
498 GET_THREAD_INFO(%ebp)
499 # system call tracing in operation / emulation
500 testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
501 jnz syscall_trace_entry
502 cmpl $(NR_syscalls), %eax
503 jae syscall_badsys
504syscall_call:
505 call *sys_call_table(,%eax,4)
506syscall_after_call:
507 movl %eax,PT_EAX(%esp) # store the return value
508syscall_exit:
509 LOCKDEP_SYS_EXIT
510 DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
511 # setting need_resched or sigpending
512 # between sampling and the iret
513 TRACE_IRQS_OFF
514 movl TI_flags(%ebp), %ecx
515 testl $_TIF_ALLWORK_MASK, %ecx # current->work
516 jnz syscall_exit_work
517
518restore_all:
519 TRACE_IRQS_IRET
520restore_all_notrace:
521#ifdef CONFIG_X86_ESPFIX32
522 movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS
523 # Warning: PT_OLDSS(%esp) contains the wrong/random values if we
524 # are returning to the kernel.
525 # See comments in process.c:copy_thread() for details.
526 movb PT_OLDSS(%esp), %ah
527 movb PT_CS(%esp), %al
528 andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
529 cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
530 CFI_REMEMBER_STATE
531 je ldt_ss # returning to user-space with LDT SS
532#endif
533restore_nocheck:
534 RESTORE_REGS 4 # skip orig_eax/error_code
535irq_return:
536 INTERRUPT_RETURN
537.section .fixup,"ax"
538ENTRY(iret_exc)
539 pushl $0 # no error code
540 pushl $do_iret_error
541 jmp error_code
542.previous
543 _ASM_EXTABLE(irq_return,iret_exc)
544
545#ifdef CONFIG_X86_ESPFIX32
546 CFI_RESTORE_STATE
547ldt_ss:
548#ifdef CONFIG_PARAVIRT
549 /*
550 * The kernel can't run on a non-flat stack if paravirt mode
551 * is active. Rather than try to fixup the high bits of
552 * ESP, bypass this code entirely. This may break DOSemu
553 * and/or Wine support in a paravirt VM, although the option
554 * is still available to implement the setting of the high
555 * 16-bits in the INTERRUPT_RETURN paravirt-op.
556 */
557 cmpl $0, pv_info+PARAVIRT_enabled
558 jne restore_nocheck
559#endif
560
561/*
562 * Setup and switch to ESPFIX stack
563 *
564 * We're returning to userspace with a 16 bit stack. The CPU will not
565 * restore the high word of ESP for us on executing iret... This is an
566 * "official" bug of all the x86-compatible CPUs, which we can work
567 * around to make dosemu and wine happy. We do this by preloading the
568 * high word of ESP with the high word of the userspace ESP while
569 * compensating for the offset by changing to the ESPFIX segment with
570 * a base address that matches for the difference.
571 */
572#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
573 mov %esp, %edx /* load kernel esp */
574 mov PT_OLDESP(%esp), %eax /* load userspace esp */
575 mov %dx, %ax /* eax: new kernel esp */
576 sub %eax, %edx /* offset (low word is 0) */
577 shr $16, %edx
578 mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */
579 mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */
580 pushl_cfi $__ESPFIX_SS
581 pushl_cfi %eax /* new kernel esp */
582 /* Disable interrupts, but do not irqtrace this section: we
583 * will soon execute iret and the tracer was already set to
584 * the irqstate after the iret */
585 DISABLE_INTERRUPTS(CLBR_EAX)
586 lss (%esp), %esp /* switch to espfix segment */
587 CFI_ADJUST_CFA_OFFSET -8
588 jmp restore_nocheck
589#endif
590 CFI_ENDPROC
591ENDPROC(system_call)
592
593 # perform work that needs to be done immediately before resumption
594 ALIGN
595 RING0_PTREGS_FRAME # can't unwind into user space anyway
596work_pending:
597 testb $_TIF_NEED_RESCHED, %cl
598 jz work_notifysig
599work_resched:
600 call schedule
601 LOCKDEP_SYS_EXIT
602 DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
603 # setting need_resched or sigpending
604 # between sampling and the iret
605 TRACE_IRQS_OFF
606 movl TI_flags(%ebp), %ecx
607 andl $_TIF_WORK_MASK, %ecx # is there any work to be done other
608 # than syscall tracing?
609 jz restore_all
610 testb $_TIF_NEED_RESCHED, %cl
611 jnz work_resched
612
613work_notifysig: # deal with pending signals and
614 # notify-resume requests
615#ifdef CONFIG_VM86
616 testl $X86_EFLAGS_VM, PT_EFLAGS(%esp)
617 movl %esp, %eax
618 jnz work_notifysig_v86 # returning to kernel-space or
619 # vm86-space
6201:
621#else
622 movl %esp, %eax
623#endif
624 TRACE_IRQS_ON
625 ENABLE_INTERRUPTS(CLBR_NONE)
626 movb PT_CS(%esp), %bl
627 andb $SEGMENT_RPL_MASK, %bl
628 cmpb $USER_RPL, %bl
629 jb resume_kernel
630 xorl %edx, %edx
631 call do_notify_resume
632 jmp resume_userspace
633
634#ifdef CONFIG_VM86
635 ALIGN
636work_notifysig_v86:
637 pushl_cfi %ecx # save ti_flags for do_notify_resume
638 call save_v86_state # %eax contains pt_regs pointer
639 popl_cfi %ecx
640 movl %eax, %esp
641 jmp 1b
642#endif
643END(work_pending)
644
645 # perform syscall exit tracing
646 ALIGN
647syscall_trace_entry:
648 movl $-ENOSYS,PT_EAX(%esp)
649 movl %esp, %eax
650 call syscall_trace_enter
651 /* What it returned is what we'll actually use. */
652 cmpl $(NR_syscalls), %eax
653 jnae syscall_call
654 jmp syscall_exit
655END(syscall_trace_entry)
656
657 # perform syscall exit tracing
658 ALIGN
659syscall_exit_work:
660 testl $_TIF_WORK_SYSCALL_EXIT, %ecx
661 jz work_pending
662 TRACE_IRQS_ON
663 ENABLE_INTERRUPTS(CLBR_ANY) # could let syscall_trace_leave() call
664 # schedule() instead
665 movl %esp, %eax
666 call syscall_trace_leave
667 jmp resume_userspace
668END(syscall_exit_work)
669 CFI_ENDPROC
670
671 RING0_INT_FRAME # can't unwind into user space anyway
672syscall_fault:
673 ASM_CLAC
674 GET_THREAD_INFO(%ebp)
675 movl $-EFAULT,PT_EAX(%esp)
676 jmp resume_userspace
677END(syscall_fault)
678
679syscall_badsys:
680 movl $-ENOSYS,%eax
681 jmp syscall_after_call
682END(syscall_badsys)
683
684sysenter_badsys:
685 movl $-ENOSYS,%eax
686 jmp sysenter_after_call
687END(sysenter_badsys)
688 CFI_ENDPROC
689
690.macro FIXUP_ESPFIX_STACK
691/*
692 * Switch back for ESPFIX stack to the normal zerobased stack
693 *
694 * We can't call C functions using the ESPFIX stack. This code reads
695 * the high word of the segment base from the GDT and swiches to the
696 * normal stack and adjusts ESP with the matching offset.
697 */
698#ifdef CONFIG_X86_ESPFIX32
699 /* fixup the stack */
700 mov GDT_ESPFIX_SS + 4, %al /* bits 16..23 */
701 mov GDT_ESPFIX_SS + 7, %ah /* bits 24..31 */
702 shl $16, %eax
703 addl %esp, %eax /* the adjusted stack pointer */
704 pushl_cfi $__KERNEL_DS
705 pushl_cfi %eax
706 lss (%esp), %esp /* switch to the normal stack segment */
707 CFI_ADJUST_CFA_OFFSET -8
708#endif
709.endm
710.macro UNWIND_ESPFIX_STACK
711#ifdef CONFIG_X86_ESPFIX32
712 movl %ss, %eax
713 /* see if on espfix stack */
714 cmpw $__ESPFIX_SS, %ax
715 jne 27f
716 movl $__KERNEL_DS, %eax
717 movl %eax, %ds
718 movl %eax, %es
719 /* switch to normal stack */
720 FIXUP_ESPFIX_STACK
72127:
722#endif
723.endm
724
725/*
726 * Build the entry stubs with some assembler magic.
727 * We pack 1 stub into every 8-byte block.
728 */
729 .align 8
730ENTRY(irq_entries_start)
731 RING0_INT_FRAME
732 vector=FIRST_EXTERNAL_VECTOR
733 .rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
734 pushl_cfi $(~vector+0x80) /* Note: always in signed byte range */
735 vector=vector+1
736 jmp common_interrupt
737 CFI_ADJUST_CFA_OFFSET -4
738 .align 8
739 .endr
740END(irq_entries_start)
741
742/*
743 * the CPU automatically disables interrupts when executing an IRQ vector,
744 * so IRQ-flags tracing has to follow that:
745 */
746 .p2align CONFIG_X86_L1_CACHE_SHIFT
747common_interrupt:
748 ASM_CLAC
749 addl $-0x80,(%esp) /* Adjust vector into the [-256,-1] range */
750 SAVE_ALL
751 TRACE_IRQS_OFF
752 movl %esp,%eax
753 call do_IRQ
754 jmp ret_from_intr
755ENDPROC(common_interrupt)
756 CFI_ENDPROC
757
758#define BUILD_INTERRUPT3(name, nr, fn) \
759ENTRY(name) \
760 RING0_INT_FRAME; \
761 ASM_CLAC; \
762 pushl_cfi $~(nr); \
763 SAVE_ALL; \
764 TRACE_IRQS_OFF \
765 movl %esp,%eax; \
766 call fn; \
767 jmp ret_from_intr; \
768 CFI_ENDPROC; \
769ENDPROC(name)
770
771
772#ifdef CONFIG_TRACING
773#define TRACE_BUILD_INTERRUPT(name, nr) \
774 BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name)
775#else
776#define TRACE_BUILD_INTERRUPT(name, nr)
777#endif
778
779#define BUILD_INTERRUPT(name, nr) \
780 BUILD_INTERRUPT3(name, nr, smp_##name); \
781 TRACE_BUILD_INTERRUPT(name, nr)
782
783/* The include is where all of the SMP etc. interrupts come from */
784#include <asm/entry_arch.h>
785
786ENTRY(coprocessor_error)
787 RING0_INT_FRAME
788 ASM_CLAC
789 pushl_cfi $0
790 pushl_cfi $do_coprocessor_error
791 jmp error_code
792 CFI_ENDPROC
793END(coprocessor_error)
794
795ENTRY(simd_coprocessor_error)
796 RING0_INT_FRAME
797 ASM_CLAC
798 pushl_cfi $0
799#ifdef CONFIG_X86_INVD_BUG
800 /* AMD 486 bug: invd from userspace calls exception 19 instead of #GP */
801 ALTERNATIVE "pushl_cfi $do_general_protection", \
802 "pushl $do_simd_coprocessor_error", \
803 X86_FEATURE_XMM
804#else
805 pushl_cfi $do_simd_coprocessor_error
806#endif
807 jmp error_code
808 CFI_ENDPROC
809END(simd_coprocessor_error)
810
811ENTRY(device_not_available)
812 RING0_INT_FRAME
813 ASM_CLAC
814 pushl_cfi $-1 # mark this as an int
815 pushl_cfi $do_device_not_available
816 jmp error_code
817 CFI_ENDPROC
818END(device_not_available)
819
820#ifdef CONFIG_PARAVIRT
821ENTRY(native_iret)
822 iret
823 _ASM_EXTABLE(native_iret, iret_exc)
824END(native_iret)
825
826ENTRY(native_irq_enable_sysexit)
827 sti
828 sysexit
829END(native_irq_enable_sysexit)
830#endif
831
832ENTRY(overflow)
833 RING0_INT_FRAME
834 ASM_CLAC
835 pushl_cfi $0
836 pushl_cfi $do_overflow
837 jmp error_code
838 CFI_ENDPROC
839END(overflow)
840
841ENTRY(bounds)
842 RING0_INT_FRAME
843 ASM_CLAC
844 pushl_cfi $0
845 pushl_cfi $do_bounds
846 jmp error_code
847 CFI_ENDPROC
848END(bounds)
849
850ENTRY(invalid_op)
851 RING0_INT_FRAME
852 ASM_CLAC
853 pushl_cfi $0
854 pushl_cfi $do_invalid_op
855 jmp error_code
856 CFI_ENDPROC
857END(invalid_op)
858
859ENTRY(coprocessor_segment_overrun)
860 RING0_INT_FRAME
861 ASM_CLAC
862 pushl_cfi $0
863 pushl_cfi $do_coprocessor_segment_overrun
864 jmp error_code
865 CFI_ENDPROC
866END(coprocessor_segment_overrun)
867
868ENTRY(invalid_TSS)
869 RING0_EC_FRAME
870 ASM_CLAC
871 pushl_cfi $do_invalid_TSS
872 jmp error_code
873 CFI_ENDPROC
874END(invalid_TSS)
875
876ENTRY(segment_not_present)
877 RING0_EC_FRAME
878 ASM_CLAC
879 pushl_cfi $do_segment_not_present
880 jmp error_code
881 CFI_ENDPROC
882END(segment_not_present)
883
884ENTRY(stack_segment)
885 RING0_EC_FRAME
886 ASM_CLAC
887 pushl_cfi $do_stack_segment
888 jmp error_code
889 CFI_ENDPROC
890END(stack_segment)
891
892ENTRY(alignment_check)
893 RING0_EC_FRAME
894 ASM_CLAC
895 pushl_cfi $do_alignment_check
896 jmp error_code
897 CFI_ENDPROC
898END(alignment_check)
899
900ENTRY(divide_error)
901 RING0_INT_FRAME
902 ASM_CLAC
903 pushl_cfi $0 # no error code
904 pushl_cfi $do_divide_error
905 jmp error_code
906 CFI_ENDPROC
907END(divide_error)
908
909#ifdef CONFIG_X86_MCE
910ENTRY(machine_check)
911 RING0_INT_FRAME
912 ASM_CLAC
913 pushl_cfi $0
914 pushl_cfi machine_check_vector
915 jmp error_code
916 CFI_ENDPROC
917END(machine_check)
918#endif
919
920ENTRY(spurious_interrupt_bug)
921 RING0_INT_FRAME
922 ASM_CLAC
923 pushl_cfi $0
924 pushl_cfi $do_spurious_interrupt_bug
925 jmp error_code
926 CFI_ENDPROC
927END(spurious_interrupt_bug)
928
929#ifdef CONFIG_XEN
930/* Xen doesn't set %esp to be precisely what the normal sysenter
931 entrypoint expects, so fix it up before using the normal path. */
932ENTRY(xen_sysenter_target)
933 RING0_INT_FRAME
934 addl $5*4, %esp /* remove xen-provided frame */
935 CFI_ADJUST_CFA_OFFSET -5*4
936 jmp sysenter_past_esp
937 CFI_ENDPROC
938
939ENTRY(xen_hypervisor_callback)
940 CFI_STARTPROC
941 pushl_cfi $-1 /* orig_ax = -1 => not a system call */
942 SAVE_ALL
943 TRACE_IRQS_OFF
944
945 /* Check to see if we got the event in the critical
946 region in xen_iret_direct, after we've reenabled
947 events and checked for pending events. This simulates
948 iret instruction's behaviour where it delivers a
949 pending interrupt when enabling interrupts. */
950 movl PT_EIP(%esp),%eax
951 cmpl $xen_iret_start_crit,%eax
952 jb 1f
953 cmpl $xen_iret_end_crit,%eax
954 jae 1f
955
956 jmp xen_iret_crit_fixup
957
958ENTRY(xen_do_upcall)
9591: mov %esp, %eax
960 call xen_evtchn_do_upcall
961#ifndef CONFIG_PREEMPT
962 call xen_maybe_preempt_hcall
963#endif
964 jmp ret_from_intr
965 CFI_ENDPROC
966ENDPROC(xen_hypervisor_callback)
967
968# Hypervisor uses this for application faults while it executes.
969# We get here for two reasons:
970# 1. Fault while reloading DS, ES, FS or GS
971# 2. Fault while executing IRET
972# Category 1 we fix up by reattempting the load, and zeroing the segment
973# register if the load fails.
974# Category 2 we fix up by jumping to do_iret_error. We cannot use the
975# normal Linux return path in this case because if we use the IRET hypercall
976# to pop the stack frame we end up in an infinite loop of failsafe callbacks.
977# We distinguish between categories by maintaining a status value in EAX.
978ENTRY(xen_failsafe_callback)
979 CFI_STARTPROC
980 pushl_cfi %eax
981 movl $1,%eax
9821: mov 4(%esp),%ds
9832: mov 8(%esp),%es
9843: mov 12(%esp),%fs
9854: mov 16(%esp),%gs
986 /* EAX == 0 => Category 1 (Bad segment)
987 EAX != 0 => Category 2 (Bad IRET) */
988 testl %eax,%eax
989 popl_cfi %eax
990 lea 16(%esp),%esp
991 CFI_ADJUST_CFA_OFFSET -16
992 jz 5f
993 jmp iret_exc
9945: pushl_cfi $-1 /* orig_ax = -1 => not a system call */
995 SAVE_ALL
996 jmp ret_from_exception
997 CFI_ENDPROC
998
999.section .fixup,"ax"
10006: xorl %eax,%eax
1001 movl %eax,4(%esp)
1002 jmp 1b
10037: xorl %eax,%eax
1004 movl %eax,8(%esp)
1005 jmp 2b
10068: xorl %eax,%eax
1007 movl %eax,12(%esp)
1008 jmp 3b
10099: xorl %eax,%eax
1010 movl %eax,16(%esp)
1011 jmp 4b
1012.previous
1013 _ASM_EXTABLE(1b,6b)
1014 _ASM_EXTABLE(2b,7b)
1015 _ASM_EXTABLE(3b,8b)
1016 _ASM_EXTABLE(4b,9b)
1017ENDPROC(xen_failsafe_callback)
1018
1019BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
1020 xen_evtchn_do_upcall)
1021
1022#endif /* CONFIG_XEN */
1023
1024#if IS_ENABLED(CONFIG_HYPERV)
1025
1026BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
1027 hyperv_vector_handler)
1028
1029#endif /* CONFIG_HYPERV */
1030
1031#ifdef CONFIG_FUNCTION_TRACER
1032#ifdef CONFIG_DYNAMIC_FTRACE
1033
1034ENTRY(mcount)
1035 ret
1036END(mcount)
1037
1038ENTRY(ftrace_caller)
1039 pushl %eax
1040 pushl %ecx
1041 pushl %edx
1042 pushl $0 /* Pass NULL as regs pointer */
1043 movl 4*4(%esp), %eax
1044 movl 0x4(%ebp), %edx
1045 movl function_trace_op, %ecx
1046 subl $MCOUNT_INSN_SIZE, %eax
1047
1048.globl ftrace_call
1049ftrace_call:
1050 call ftrace_stub
1051
1052 addl $4,%esp /* skip NULL pointer */
1053 popl %edx
1054 popl %ecx
1055 popl %eax
1056ftrace_ret:
1057#ifdef CONFIG_FUNCTION_GRAPH_TRACER
1058.globl ftrace_graph_call
1059ftrace_graph_call:
1060 jmp ftrace_stub
1061#endif
1062
1063.globl ftrace_stub
1064ftrace_stub:
1065 ret
1066END(ftrace_caller)
1067
1068ENTRY(ftrace_regs_caller)
1069 pushf /* push flags before compare (in cs location) */
1070
1071 /*
1072 * i386 does not save SS and ESP when coming from kernel.
1073 * Instead, to get sp, &regs->sp is used (see ptrace.h).
1074 * Unfortunately, that means eflags must be at the same location
1075 * as the current return ip is. We move the return ip into the
1076 * ip location, and move flags into the return ip location.
1077 */
1078 pushl 4(%esp) /* save return ip into ip slot */
1079
1080 pushl $0 /* Load 0 into orig_ax */
1081 pushl %gs
1082 pushl %fs
1083 pushl %es
1084 pushl %ds
1085 pushl %eax
1086 pushl %ebp
1087 pushl %edi
1088 pushl %esi
1089 pushl %edx
1090 pushl %ecx
1091 pushl %ebx
1092
1093 movl 13*4(%esp), %eax /* Get the saved flags */
1094 movl %eax, 14*4(%esp) /* Move saved flags into regs->flags location */
1095 /* clobbering return ip */
1096 movl $__KERNEL_CS,13*4(%esp)
1097
1098 movl 12*4(%esp), %eax /* Load ip (1st parameter) */
1099 subl $MCOUNT_INSN_SIZE, %eax /* Adjust ip */
1100 movl 0x4(%ebp), %edx /* Load parent ip (2nd parameter) */
1101 movl function_trace_op, %ecx /* Save ftrace_pos in 3rd parameter */
1102 pushl %esp /* Save pt_regs as 4th parameter */
1103
1104GLOBAL(ftrace_regs_call)
1105 call ftrace_stub
1106
1107 addl $4, %esp /* Skip pt_regs */
1108 movl 14*4(%esp), %eax /* Move flags back into cs */
1109 movl %eax, 13*4(%esp) /* Needed to keep addl from modifying flags */
1110 movl 12*4(%esp), %eax /* Get return ip from regs->ip */
1111 movl %eax, 14*4(%esp) /* Put return ip back for ret */
1112
1113 popl %ebx
1114 popl %ecx
1115 popl %edx
1116 popl %esi
1117 popl %edi
1118 popl %ebp
1119 popl %eax
1120 popl %ds
1121 popl %es
1122 popl %fs
1123 popl %gs
1124 addl $8, %esp /* Skip orig_ax and ip */
1125 popf /* Pop flags at end (no addl to corrupt flags) */
1126 jmp ftrace_ret
1127
1128 popf
1129 jmp ftrace_stub
1130#else /* ! CONFIG_DYNAMIC_FTRACE */
1131
1132ENTRY(mcount)
1133 cmpl $__PAGE_OFFSET, %esp
1134 jb ftrace_stub /* Paging not enabled yet? */
1135
1136 cmpl $ftrace_stub, ftrace_trace_function
1137 jnz trace
1138#ifdef CONFIG_FUNCTION_GRAPH_TRACER
1139 cmpl $ftrace_stub, ftrace_graph_return
1140 jnz ftrace_graph_caller
1141
1142 cmpl $ftrace_graph_entry_stub, ftrace_graph_entry
1143 jnz ftrace_graph_caller
1144#endif
1145.globl ftrace_stub
1146ftrace_stub:
1147 ret
1148
1149 /* taken from glibc */
1150trace:
1151 pushl %eax
1152 pushl %ecx
1153 pushl %edx
1154 movl 0xc(%esp), %eax
1155 movl 0x4(%ebp), %edx
1156 subl $MCOUNT_INSN_SIZE, %eax
1157
1158 call *ftrace_trace_function
1159
1160 popl %edx
1161 popl %ecx
1162 popl %eax
1163 jmp ftrace_stub
1164END(mcount)
1165#endif /* CONFIG_DYNAMIC_FTRACE */
1166#endif /* CONFIG_FUNCTION_TRACER */
1167
1168#ifdef CONFIG_FUNCTION_GRAPH_TRACER
1169ENTRY(ftrace_graph_caller)
1170 pushl %eax
1171 pushl %ecx
1172 pushl %edx
1173 movl 0xc(%esp), %eax
1174 lea 0x4(%ebp), %edx
1175 movl (%ebp), %ecx
1176 subl $MCOUNT_INSN_SIZE, %eax
1177 call prepare_ftrace_return
1178 popl %edx
1179 popl %ecx
1180 popl %eax
1181 ret
1182END(ftrace_graph_caller)
1183
1184.globl return_to_handler
1185return_to_handler:
1186 pushl %eax
1187 pushl %edx
1188 movl %ebp, %eax
1189 call ftrace_return_to_handler
1190 movl %eax, %ecx
1191 popl %edx
1192 popl %eax
1193 jmp *%ecx
1194#endif
1195
1196#ifdef CONFIG_TRACING
1197ENTRY(trace_page_fault)
1198 RING0_EC_FRAME
1199 ASM_CLAC
1200 pushl_cfi $trace_do_page_fault
1201 jmp error_code
1202 CFI_ENDPROC
1203END(trace_page_fault)
1204#endif
1205
1206ENTRY(page_fault)
1207 RING0_EC_FRAME
1208 ASM_CLAC
1209 pushl_cfi $do_page_fault
1210 ALIGN
1211error_code:
1212 /* the function address is in %gs's slot on the stack */
1213 pushl_cfi %fs
1214 /*CFI_REL_OFFSET fs, 0*/
1215 pushl_cfi %es
1216 /*CFI_REL_OFFSET es, 0*/
1217 pushl_cfi %ds
1218 /*CFI_REL_OFFSET ds, 0*/
1219 pushl_cfi_reg eax
1220 pushl_cfi_reg ebp
1221 pushl_cfi_reg edi
1222 pushl_cfi_reg esi
1223 pushl_cfi_reg edx
1224 pushl_cfi_reg ecx
1225 pushl_cfi_reg ebx
1226 cld
1227 movl $(__KERNEL_PERCPU), %ecx
1228 movl %ecx, %fs
1229 UNWIND_ESPFIX_STACK
1230 GS_TO_REG %ecx
1231 movl PT_GS(%esp), %edi # get the function address
1232 movl PT_ORIG_EAX(%esp), %edx # get the error code
1233 movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
1234 REG_TO_PTGS %ecx
1235 SET_KERNEL_GS %ecx
1236 movl $(__USER_DS), %ecx
1237 movl %ecx, %ds
1238 movl %ecx, %es
1239 TRACE_IRQS_OFF
1240 movl %esp,%eax # pt_regs pointer
1241 call *%edi
1242 jmp ret_from_exception
1243 CFI_ENDPROC
1244END(page_fault)
1245
1246/*
1247 * Debug traps and NMI can happen at the one SYSENTER instruction
1248 * that sets up the real kernel stack. Check here, since we can't
1249 * allow the wrong stack to be used.
1250 *
1251 * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
1252 * already pushed 3 words if it hits on the sysenter instruction:
1253 * eflags, cs and eip.
1254 *
1255 * We just load the right stack, and push the three (known) values
1256 * by hand onto the new stack - while updating the return eip past
1257 * the instruction that would have done it for sysenter.
1258 */
1259.macro FIX_STACK offset ok label
1260 cmpw $__KERNEL_CS, 4(%esp)
1261 jne \ok
1262\label:
1263 movl TSS_sysenter_sp0 + \offset(%esp), %esp
1264 CFI_DEF_CFA esp, 0
1265 CFI_UNDEFINED eip
1266 pushfl_cfi
1267 pushl_cfi $__KERNEL_CS
1268 pushl_cfi $sysenter_past_esp
1269 CFI_REL_OFFSET eip, 0
1270.endm
1271
1272ENTRY(debug)
1273 RING0_INT_FRAME
1274 ASM_CLAC
1275 cmpl $ia32_sysenter_target,(%esp)
1276 jne debug_stack_correct
1277 FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
1278debug_stack_correct:
1279 pushl_cfi $-1 # mark this as an int
1280 SAVE_ALL
1281 TRACE_IRQS_OFF
1282 xorl %edx,%edx # error code 0
1283 movl %esp,%eax # pt_regs pointer
1284 call do_debug
1285 jmp ret_from_exception
1286 CFI_ENDPROC
1287END(debug)
1288
1289/*
1290 * NMI is doubly nasty. It can happen _while_ we're handling
1291 * a debug fault, and the debug fault hasn't yet been able to
1292 * clear up the stack. So we first check whether we got an
1293 * NMI on the sysenter entry path, but after that we need to
1294 * check whether we got an NMI on the debug path where the debug
1295 * fault happened on the sysenter path.
1296 */
1297ENTRY(nmi)
1298 RING0_INT_FRAME
1299 ASM_CLAC
1300#ifdef CONFIG_X86_ESPFIX32
1301 pushl_cfi %eax
1302 movl %ss, %eax
1303 cmpw $__ESPFIX_SS, %ax
1304 popl_cfi %eax
1305 je nmi_espfix_stack
1306#endif
1307 cmpl $ia32_sysenter_target,(%esp)
1308 je nmi_stack_fixup
1309 pushl_cfi %eax
1310 movl %esp,%eax
1311 /* Do not access memory above the end of our stack page,
1312 * it might not exist.
1313 */
1314 andl $(THREAD_SIZE-1),%eax
1315 cmpl $(THREAD_SIZE-20),%eax
1316 popl_cfi %eax
1317 jae nmi_stack_correct
1318 cmpl $ia32_sysenter_target,12(%esp)
1319 je nmi_debug_stack_check
1320nmi_stack_correct:
1321 /* We have a RING0_INT_FRAME here */
1322 pushl_cfi %eax
1323 SAVE_ALL
1324 xorl %edx,%edx # zero error code
1325 movl %esp,%eax # pt_regs pointer
1326 call do_nmi
1327 jmp restore_all_notrace
1328 CFI_ENDPROC
1329
1330nmi_stack_fixup:
1331 RING0_INT_FRAME
1332 FIX_STACK 12, nmi_stack_correct, 1
1333 jmp nmi_stack_correct
1334
1335nmi_debug_stack_check:
1336 /* We have a RING0_INT_FRAME here */
1337 cmpw $__KERNEL_CS,16(%esp)
1338 jne nmi_stack_correct
1339 cmpl $debug,(%esp)
1340 jb nmi_stack_correct
1341 cmpl $debug_esp_fix_insn,(%esp)
1342 ja nmi_stack_correct
1343 FIX_STACK 24, nmi_stack_correct, 1
1344 jmp nmi_stack_correct
1345
1346#ifdef CONFIG_X86_ESPFIX32
1347nmi_espfix_stack:
1348 /* We have a RING0_INT_FRAME here.
1349 *
1350 * create the pointer to lss back
1351 */
1352 pushl_cfi %ss
1353 pushl_cfi %esp
1354 addl $4, (%esp)
1355 /* copy the iret frame of 12 bytes */
1356 .rept 3
1357 pushl_cfi 16(%esp)
1358 .endr
1359 pushl_cfi %eax
1360 SAVE_ALL
1361 FIXUP_ESPFIX_STACK # %eax == %esp
1362 xorl %edx,%edx # zero error code
1363 call do_nmi
1364 RESTORE_REGS
1365 lss 12+4(%esp), %esp # back to espfix stack
1366 CFI_ADJUST_CFA_OFFSET -24
1367 jmp irq_return
1368#endif
1369 CFI_ENDPROC
1370END(nmi)
1371
1372ENTRY(int3)
1373 RING0_INT_FRAME
1374 ASM_CLAC
1375 pushl_cfi $-1 # mark this as an int
1376 SAVE_ALL
1377 TRACE_IRQS_OFF
1378 xorl %edx,%edx # zero error code
1379 movl %esp,%eax # pt_regs pointer
1380 call do_int3
1381 jmp ret_from_exception
1382 CFI_ENDPROC
1383END(int3)
1384
1385ENTRY(general_protection)
1386 RING0_EC_FRAME
1387 pushl_cfi $do_general_protection
1388 jmp error_code
1389 CFI_ENDPROC
1390END(general_protection)
1391
1392#ifdef CONFIG_KVM_GUEST
1393ENTRY(async_page_fault)
1394 RING0_EC_FRAME
1395 ASM_CLAC
1396 pushl_cfi $do_async_page_fault
1397 jmp error_code
1398 CFI_ENDPROC
1399END(async_page_fault)
1400#endif
1401
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 7e429c99c728..0e2d96ffd158 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -557,7 +557,7 @@ early_idt_handler_common:
557 cld 557 cld
558 558
559 cmpl $2,(%esp) # X86_TRAP_NMI 559 cmpl $2,(%esp) # X86_TRAP_NMI
560 je is_nmi # Ignore NMI 560 je .Lis_nmi # Ignore NMI
561 561
562 cmpl $2,%ss:early_recursion_flag 562 cmpl $2,%ss:early_recursion_flag
563 je hlt_loop 563 je hlt_loop
@@ -610,7 +610,7 @@ ex_entry:
610 pop %ecx 610 pop %ecx
611 pop %eax 611 pop %eax
612 decl %ss:early_recursion_flag 612 decl %ss:early_recursion_flag
613is_nmi: 613.Lis_nmi:
614 addl $8,%esp /* drop vector number and error code */ 614 addl $8,%esp /* drop vector number and error code */
615 iret 615 iret
616ENDPROC(early_idt_handler_common) 616ENDPROC(early_idt_handler_common)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index df7e78057ae0..e5c27f729a38 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -346,7 +346,7 @@ early_idt_handler_common:
346 cld 346 cld
347 347
348 cmpl $2,(%rsp) # X86_TRAP_NMI 348 cmpl $2,(%rsp) # X86_TRAP_NMI
349 je is_nmi # Ignore NMI 349 je .Lis_nmi # Ignore NMI
350 350
351 cmpl $2,early_recursion_flag(%rip) 351 cmpl $2,early_recursion_flag(%rip)
352 jz 1f 352 jz 1f
@@ -411,7 +411,7 @@ early_idt_handler_common:
411 popq %rcx 411 popq %rcx
412 popq %rax 412 popq %rax
413 decl early_recursion_flag(%rip) 413 decl early_recursion_flag(%rip)
414is_nmi: 414.Lis_nmi:
415 addq $16,%rsp # drop vector number and error code 415 addq $16,%rsp # drop vector number and error code
416 INTERRUPT_RETURN 416 INTERRUPT_RETURN
417ENDPROC(early_idt_handler_common) 417ENDPROC(early_idt_handler_common)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3acbff4716b0..10757d0a3fcf 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -12,6 +12,7 @@
12#include <linux/pm.h> 12#include <linux/pm.h>
13#include <linux/io.h> 13#include <linux/io.h>
14 14
15#include <asm/irqdomain.h>
15#include <asm/fixmap.h> 16#include <asm/fixmap.h>
16#include <asm/hpet.h> 17#include <asm/hpet.h>
17#include <asm/time.h> 18#include <asm/time.h>
@@ -305,8 +306,6 @@ static void hpet_legacy_clockevent_register(void)
305 printk(KERN_DEBUG "hpet clockevent registered\n"); 306 printk(KERN_DEBUG "hpet clockevent registered\n");
306} 307}
307 308
308static int hpet_setup_msi_irq(unsigned int irq);
309
310static void hpet_set_mode(enum clock_event_mode mode, 309static void hpet_set_mode(enum clock_event_mode mode,
311 struct clock_event_device *evt, int timer) 310 struct clock_event_device *evt, int timer)
312{ 311{
@@ -357,7 +356,7 @@ static void hpet_set_mode(enum clock_event_mode mode,
357 hpet_enable_legacy_int(); 356 hpet_enable_legacy_int();
358 } else { 357 } else {
359 struct hpet_dev *hdev = EVT_TO_HPET_DEV(evt); 358 struct hpet_dev *hdev = EVT_TO_HPET_DEV(evt);
360 hpet_setup_msi_irq(hdev->irq); 359 irq_domain_activate_irq(irq_get_irq_data(hdev->irq));
361 disable_irq(hdev->irq); 360 disable_irq(hdev->irq);
362 irq_set_affinity(hdev->irq, cpumask_of(hdev->cpu)); 361 irq_set_affinity(hdev->irq, cpumask_of(hdev->cpu));
363 enable_irq(hdev->irq); 362 enable_irq(hdev->irq);
@@ -423,6 +422,7 @@ static int hpet_legacy_next_event(unsigned long delta,
423 422
424static DEFINE_PER_CPU(struct hpet_dev *, cpu_hpet_dev); 423static DEFINE_PER_CPU(struct hpet_dev *, cpu_hpet_dev);
425static struct hpet_dev *hpet_devs; 424static struct hpet_dev *hpet_devs;
425static struct irq_domain *hpet_domain;
426 426
427void hpet_msi_unmask(struct irq_data *data) 427void hpet_msi_unmask(struct irq_data *data)
428{ 428{
@@ -473,31 +473,6 @@ static int hpet_msi_next_event(unsigned long delta,
473 return hpet_next_event(delta, evt, hdev->num); 473 return hpet_next_event(delta, evt, hdev->num);
474} 474}
475 475
476static int hpet_setup_msi_irq(unsigned int irq)
477{
478 if (x86_msi.setup_hpet_msi(irq, hpet_blockid)) {
479 irq_free_hwirq(irq);
480 return -EINVAL;
481 }
482 return 0;
483}
484
485static int hpet_assign_irq(struct hpet_dev *dev)
486{
487 unsigned int irq = irq_alloc_hwirq(-1);
488
489 if (!irq)
490 return -EINVAL;
491
492 irq_set_handler_data(irq, dev);
493
494 if (hpet_setup_msi_irq(irq))
495 return -EINVAL;
496
497 dev->irq = irq;
498 return 0;
499}
500
501static irqreturn_t hpet_interrupt_handler(int irq, void *data) 476static irqreturn_t hpet_interrupt_handler(int irq, void *data)
502{ 477{
503 struct hpet_dev *dev = (struct hpet_dev *)data; 478 struct hpet_dev *dev = (struct hpet_dev *)data;
@@ -540,9 +515,6 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
540 if (!(hdev->flags & HPET_DEV_VALID)) 515 if (!(hdev->flags & HPET_DEV_VALID))
541 return; 516 return;
542 517
543 if (hpet_setup_msi_irq(hdev->irq))
544 return;
545
546 hdev->cpu = cpu; 518 hdev->cpu = cpu;
547 per_cpu(cpu_hpet_dev, cpu) = hdev; 519 per_cpu(cpu_hpet_dev, cpu) = hdev;
548 evt->name = hdev->name; 520 evt->name = hdev->name;
@@ -574,7 +546,7 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
574 unsigned int id; 546 unsigned int id;
575 unsigned int num_timers; 547 unsigned int num_timers;
576 unsigned int num_timers_used = 0; 548 unsigned int num_timers_used = 0;
577 int i; 549 int i, irq;
578 550
579 if (hpet_msi_disable) 551 if (hpet_msi_disable)
580 return; 552 return;
@@ -587,6 +559,10 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
587 num_timers++; /* Value read out starts from 0 */ 559 num_timers++; /* Value read out starts from 0 */
588 hpet_print_config(); 560 hpet_print_config();
589 561
562 hpet_domain = hpet_create_irq_domain(hpet_blockid);
563 if (!hpet_domain)
564 return;
565
590 hpet_devs = kzalloc(sizeof(struct hpet_dev) * num_timers, GFP_KERNEL); 566 hpet_devs = kzalloc(sizeof(struct hpet_dev) * num_timers, GFP_KERNEL);
591 if (!hpet_devs) 567 if (!hpet_devs)
592 return; 568 return;
@@ -604,12 +580,14 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
604 hdev->flags = 0; 580 hdev->flags = 0;
605 if (cfg & HPET_TN_PERIODIC_CAP) 581 if (cfg & HPET_TN_PERIODIC_CAP)
606 hdev->flags |= HPET_DEV_PERI_CAP; 582 hdev->flags |= HPET_DEV_PERI_CAP;
583 sprintf(hdev->name, "hpet%d", i);
607 hdev->num = i; 584 hdev->num = i;
608 585
609 sprintf(hdev->name, "hpet%d", i); 586 irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
610 if (hpet_assign_irq(hdev)) 587 if (irq <= 0)
611 continue; 588 continue;
612 589
590 hdev->irq = irq;
613 hdev->flags |= HPET_DEV_FSB_CAP; 591 hdev->flags |= HPET_DEV_FSB_CAP;
614 hdev->flags |= HPET_DEV_VALID; 592 hdev->flags |= HPET_DEV_VALID;
615 num_timers_used++; 593 num_timers_used++;
@@ -709,10 +687,6 @@ static int hpet_cpuhp_notify(struct notifier_block *n,
709} 687}
710#else 688#else
711 689
712static int hpet_setup_msi_irq(unsigned int irq)
713{
714 return 0;
715}
716static void hpet_msi_capability_lookup(unsigned int start_timer) 690static void hpet_msi_capability_lookup(unsigned int start_timer)
717{ 691{
718 return; 692 return;
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index e7cc5370cd2f..16cb827a5b27 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -329,8 +329,8 @@ static void init_8259A(int auto_eoi)
329 */ 329 */
330 outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */ 330 outb_pic(0x11, PIC_MASTER_CMD); /* ICW1: select 8259A-1 init */
331 331
332 /* ICW2: 8259A-1 IR0-7 mapped to 0x30-0x37 */ 332 /* ICW2: 8259A-1 IR0-7 mapped to ISA_IRQ_VECTOR(0) */
333 outb_pic(IRQ0_VECTOR, PIC_MASTER_IMR); 333 outb_pic(ISA_IRQ_VECTOR(0), PIC_MASTER_IMR);
334 334
335 /* 8259A-1 (the master) has a slave on IR2 */ 335 /* 8259A-1 (the master) has a slave on IR2 */
336 outb_pic(1U << PIC_CASCADE_IR, PIC_MASTER_IMR); 336 outb_pic(1U << PIC_CASCADE_IR, PIC_MASTER_IMR);
@@ -342,8 +342,8 @@ static void init_8259A(int auto_eoi)
342 342
343 outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */ 343 outb_pic(0x11, PIC_SLAVE_CMD); /* ICW1: select 8259A-2 init */
344 344
345 /* ICW2: 8259A-2 IR0-7 mapped to IRQ8_VECTOR */ 345 /* ICW2: 8259A-2 IR0-7 mapped to ISA_IRQ_VECTOR(8) */
346 outb_pic(IRQ8_VECTOR, PIC_SLAVE_IMR); 346 outb_pic(ISA_IRQ_VECTOR(8), PIC_SLAVE_IMR);
347 /* 8259A-2 is a slave on master's IR2 */ 347 /* 8259A-2 is a slave on master's IR2 */
348 outb_pic(PIC_CASCADE_IR, PIC_SLAVE_IMR); 348 outb_pic(PIC_CASCADE_IR, PIC_SLAVE_IMR);
349 /* (slave's support for AEOI in flat mode is to be investigated) */ 349 /* (slave's support for AEOI in flat mode is to be investigated) */
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index e5952c225532..88b366487b0e 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -22,6 +22,12 @@
22#define CREATE_TRACE_POINTS 22#define CREATE_TRACE_POINTS
23#include <asm/trace/irq_vectors.h> 23#include <asm/trace/irq_vectors.h>
24 24
25DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
26EXPORT_PER_CPU_SYMBOL(irq_stat);
27
28DEFINE_PER_CPU(struct pt_regs *, irq_regs);
29EXPORT_PER_CPU_SYMBOL(irq_regs);
30
25atomic_t irq_err_count; 31atomic_t irq_err_count;
26 32
27/* Function pointer for generic interrupt vector handling */ 33/* Function pointer for generic interrupt vector handling */
@@ -116,6 +122,12 @@ int arch_show_interrupts(struct seq_file *p, int prec)
116 seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count); 122 seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
117 seq_puts(p, " Threshold APIC interrupts\n"); 123 seq_puts(p, " Threshold APIC interrupts\n");
118#endif 124#endif
125#ifdef CONFIG_X86_MCE_AMD
126 seq_printf(p, "%*s: ", prec, "DFR");
127 for_each_online_cpu(j)
128 seq_printf(p, "%10u ", irq_stats(j)->irq_deferred_error_count);
129 seq_puts(p, " Deferred Error APIC interrupts\n");
130#endif
119#ifdef CONFIG_X86_MCE 131#ifdef CONFIG_X86_MCE
120 seq_printf(p, "%*s: ", prec, "MCE"); 132 seq_printf(p, "%*s: ", prec, "MCE");
121 for_each_online_cpu(j) 133 for_each_online_cpu(j)
@@ -136,6 +148,18 @@ int arch_show_interrupts(struct seq_file *p, int prec)
136#if defined(CONFIG_X86_IO_APIC) 148#if defined(CONFIG_X86_IO_APIC)
137 seq_printf(p, "%*s: %10u\n", prec, "MIS", atomic_read(&irq_mis_count)); 149 seq_printf(p, "%*s: %10u\n", prec, "MIS", atomic_read(&irq_mis_count));
138#endif 150#endif
151#ifdef CONFIG_HAVE_KVM
152 seq_printf(p, "%*s: ", prec, "PIN");
153 for_each_online_cpu(j)
154 seq_printf(p, "%10u ", irq_stats(j)->kvm_posted_intr_ipis);
155 seq_puts(p, " Posted-interrupt notification event\n");
156
157 seq_printf(p, "%*s: ", prec, "PIW");
158 for_each_online_cpu(j)
159 seq_printf(p, "%10u ",
160 irq_stats(j)->kvm_posted_intr_wakeup_ipis);
161 seq_puts(p, " Posted-interrupt wakeup event\n");
162#endif
139 return 0; 163 return 0;
140} 164}
141 165
@@ -192,8 +216,7 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
192 unsigned vector = ~regs->orig_ax; 216 unsigned vector = ~regs->orig_ax;
193 unsigned irq; 217 unsigned irq;
194 218
195 irq_enter(); 219 entering_irq();
196 exit_idle();
197 220
198 irq = __this_cpu_read(vector_irq[vector]); 221 irq = __this_cpu_read(vector_irq[vector]);
199 222
@@ -209,7 +232,7 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
209 } 232 }
210 } 233 }
211 234
212 irq_exit(); 235 exiting_irq();
213 236
214 set_irq_regs(old_regs); 237 set_irq_regs(old_regs);
215 return 1; 238 return 1;
@@ -237,6 +260,18 @@ __visible void smp_x86_platform_ipi(struct pt_regs *regs)
237} 260}
238 261
239#ifdef CONFIG_HAVE_KVM 262#ifdef CONFIG_HAVE_KVM
263static void dummy_handler(void) {}
264static void (*kvm_posted_intr_wakeup_handler)(void) = dummy_handler;
265
266void kvm_set_posted_intr_wakeup_handler(void (*handler)(void))
267{
268 if (handler)
269 kvm_posted_intr_wakeup_handler = handler;
270 else
271 kvm_posted_intr_wakeup_handler = dummy_handler;
272}
273EXPORT_SYMBOL_GPL(kvm_set_posted_intr_wakeup_handler);
274
240/* 275/*
241 * Handler for POSTED_INTERRUPT_VECTOR. 276 * Handler for POSTED_INTERRUPT_VECTOR.
242 */ 277 */
@@ -244,16 +279,23 @@ __visible void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
244{ 279{
245 struct pt_regs *old_regs = set_irq_regs(regs); 280 struct pt_regs *old_regs = set_irq_regs(regs);
246 281
247 ack_APIC_irq(); 282 entering_ack_irq();
248
249 irq_enter();
250
251 exit_idle();
252
253 inc_irq_stat(kvm_posted_intr_ipis); 283 inc_irq_stat(kvm_posted_intr_ipis);
284 exiting_irq();
285 set_irq_regs(old_regs);
286}
254 287
255 irq_exit(); 288/*
289 * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
290 */
291__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs)
292{
293 struct pt_regs *old_regs = set_irq_regs(regs);
256 294
295 entering_ack_irq();
296 inc_irq_stat(kvm_posted_intr_wakeup_ipis);
297 kvm_posted_intr_wakeup_handler();
298 exiting_irq();
257 set_irq_regs(old_regs); 299 set_irq_regs(old_regs);
258} 300}
259#endif 301#endif
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index f9fd86a7fcc7..cd74f5978ab9 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -21,12 +21,6 @@
21 21
22#include <asm/apic.h> 22#include <asm/apic.h>
23 23
24DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
25EXPORT_PER_CPU_SYMBOL(irq_stat);
26
27DEFINE_PER_CPU(struct pt_regs *, irq_regs);
28EXPORT_PER_CPU_SYMBOL(irq_regs);
29
30#ifdef CONFIG_DEBUG_STACKOVERFLOW 24#ifdef CONFIG_DEBUG_STACKOVERFLOW
31 25
32int sysctl_panic_on_stackoverflow __read_mostly; 26int sysctl_panic_on_stackoverflow __read_mostly;
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 394e643d7830..bc4604e500a3 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -20,12 +20,6 @@
20#include <asm/idle.h> 20#include <asm/idle.h>
21#include <asm/apic.h> 21#include <asm/apic.h>
22 22
23DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
24EXPORT_PER_CPU_SYMBOL(irq_stat);
25
26DEFINE_PER_CPU(struct pt_regs *, irq_regs);
27EXPORT_PER_CPU_SYMBOL(irq_regs);
28
29int sysctl_panic_on_stackoverflow; 23int sysctl_panic_on_stackoverflow;
30 24
31/* 25/*
diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
index 15d741ddfeeb..dc5fa6a1e8d6 100644
--- a/arch/x86/kernel/irq_work.c
+++ b/arch/x86/kernel/irq_work.c
@@ -10,12 +10,6 @@
10#include <asm/apic.h> 10#include <asm/apic.h>
11#include <asm/trace/irq_vectors.h> 11#include <asm/trace/irq_vectors.h>
12 12
13static inline void irq_work_entering_irq(void)
14{
15 irq_enter();
16 ack_APIC_irq();
17}
18
19static inline void __smp_irq_work_interrupt(void) 13static inline void __smp_irq_work_interrupt(void)
20{ 14{
21 inc_irq_stat(apic_irq_work_irqs); 15 inc_irq_stat(apic_irq_work_irqs);
@@ -24,14 +18,14 @@ static inline void __smp_irq_work_interrupt(void)
24 18
25__visible void smp_irq_work_interrupt(struct pt_regs *regs) 19__visible void smp_irq_work_interrupt(struct pt_regs *regs)
26{ 20{
27 irq_work_entering_irq(); 21 ipi_entering_ack_irq();
28 __smp_irq_work_interrupt(); 22 __smp_irq_work_interrupt();
29 exiting_irq(); 23 exiting_irq();
30} 24}
31 25
32__visible void smp_trace_irq_work_interrupt(struct pt_regs *regs) 26__visible void smp_trace_irq_work_interrupt(struct pt_regs *regs)
33{ 27{
34 irq_work_entering_irq(); 28 ipi_entering_ack_irq();
35 trace_irq_work_entry(IRQ_WORK_VECTOR); 29 trace_irq_work_entry(IRQ_WORK_VECTOR);
36 __smp_irq_work_interrupt(); 30 __smp_irq_work_interrupt();
37 trace_irq_work_exit(IRQ_WORK_VECTOR); 31 trace_irq_work_exit(IRQ_WORK_VECTOR);
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index cd10a6437264..a3a5e158ed69 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -86,7 +86,7 @@ void __init init_IRQ(void)
86 int i; 86 int i;
87 87
88 /* 88 /*
89 * On cpu 0, Assign IRQ0_VECTOR..IRQ15_VECTOR's to IRQ 0..15. 89 * On cpu 0, Assign ISA_IRQ_VECTOR(irq) to IRQ 0..15.
90 * If these IRQ's are handled by legacy interrupt-controllers like PIC, 90 * If these IRQ's are handled by legacy interrupt-controllers like PIC,
91 * then this configuration will likely be static after the boot. If 91 * then this configuration will likely be static after the boot. If
92 * these IRQ's are handled by more mordern controllers like IO-APIC, 92 * these IRQ's are handled by more mordern controllers like IO-APIC,
@@ -94,7 +94,7 @@ void __init init_IRQ(void)
94 * irq's migrate etc. 94 * irq's migrate etc.
95 */ 95 */
96 for (i = 0; i < nr_legacy_irqs(); i++) 96 for (i = 0; i < nr_legacy_irqs(); i++)
97 per_cpu(vector_irq, 0)[IRQ0_VECTOR + i] = i; 97 per_cpu(vector_irq, 0)[ISA_IRQ_VECTOR(i)] = i;
98 98
99 x86_init.irqs.intr_init(); 99 x86_init.irqs.intr_init();
100} 100}
@@ -135,6 +135,10 @@ static void __init apic_intr_init(void)
135 alloc_intr_gate(THRESHOLD_APIC_VECTOR, threshold_interrupt); 135 alloc_intr_gate(THRESHOLD_APIC_VECTOR, threshold_interrupt);
136#endif 136#endif
137 137
138#ifdef CONFIG_X86_MCE_AMD
139 alloc_intr_gate(DEFERRED_ERROR_VECTOR, deferred_error_interrupt);
140#endif
141
138#ifdef CONFIG_X86_LOCAL_APIC 142#ifdef CONFIG_X86_LOCAL_APIC
139 /* self generated IPI for local APIC timer */ 143 /* self generated IPI for local APIC timer */
140 alloc_intr_gate(LOCAL_TIMER_VECTOR, apic_timer_interrupt); 144 alloc_intr_gate(LOCAL_TIMER_VECTOR, apic_timer_interrupt);
@@ -144,6 +148,8 @@ static void __init apic_intr_init(void)
144#ifdef CONFIG_HAVE_KVM 148#ifdef CONFIG_HAVE_KVM
145 /* IPI for KVM to deliver posted interrupt */ 149 /* IPI for KVM to deliver posted interrupt */
146 alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi); 150 alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
151 /* IPI for KVM to deliver interrupt to wake up tasks */
152 alloc_intr_gate(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi);
147#endif 153#endif
148 154
149 /* IPI vectors for APIC spurious and error interrupts */ 155 /* IPI vectors for APIC spurious and error interrupts */
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 415480d3ea84..11546b462fa6 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -17,6 +17,7 @@
17#include <linux/ftrace.h> 17#include <linux/ftrace.h>
18#include <linux/io.h> 18#include <linux/io.h>
19#include <linux/suspend.h> 19#include <linux/suspend.h>
20#include <linux/vmalloc.h>
20 21
21#include <asm/init.h> 22#include <asm/init.h>
22#include <asm/pgtable.h> 23#include <asm/pgtable.h>
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 2d2a237f2c73..30ca7607cbbb 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -19,8 +19,8 @@
19#include <linux/module.h> 19#include <linux/module.h>
20#include <linux/smp.h> 20#include <linux/smp.h>
21#include <linux/pci.h> 21#include <linux/pci.h>
22#include <linux/irqdomain.h>
23 22
23#include <asm/irqdomain.h>
24#include <asm/mtrr.h> 24#include <asm/mtrr.h>
25#include <asm/mpspec.h> 25#include <asm/mpspec.h>
26#include <asm/pgalloc.h> 26#include <asm/pgalloc.h>
@@ -113,11 +113,6 @@ static void __init MP_bus_info(struct mpc_bus *m)
113 pr_warn("Unknown bustype %s - ignoring\n", str); 113 pr_warn("Unknown bustype %s - ignoring\n", str);
114} 114}
115 115
116static struct irq_domain_ops mp_ioapic_irqdomain_ops = {
117 .map = mp_irqdomain_map,
118 .unmap = mp_irqdomain_unmap,
119};
120
121static void __init MP_ioapic_info(struct mpc_ioapic *m) 116static void __init MP_ioapic_info(struct mpc_ioapic *m)
122{ 117{
123 struct ioapic_domain_cfg cfg = { 118 struct ioapic_domain_cfg cfg = {
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c614dd492f5f..58bcfb67c01f 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -154,7 +154,9 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
154 ret = paravirt_patch_ident_64(insnbuf, len); 154 ret = paravirt_patch_ident_64(insnbuf, len);
155 155
156 else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) || 156 else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
157#ifdef CONFIG_X86_32
157 type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) || 158 type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) ||
159#endif
158 type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) || 160 type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) ||
159 type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64)) 161 type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64))
160 /* If operation requires a jmp, then jmp */ 162 /* If operation requires a jmp, then jmp */
@@ -371,7 +373,7 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
371 373
372 .load_sp0 = native_load_sp0, 374 .load_sp0 = native_load_sp0,
373 375
374#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) 376#if defined(CONFIG_X86_32)
375 .irq_enable_sysexit = native_irq_enable_sysexit, 377 .irq_enable_sysexit = native_irq_enable_sysexit,
376#endif 378#endif
377#ifdef CONFIG_X86_64 379#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index a1fa86782186..8aa05583bc42 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -55,7 +55,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
55 PATCH_SITE(pv_irq_ops, save_fl); 55 PATCH_SITE(pv_irq_ops, save_fl);
56 PATCH_SITE(pv_irq_ops, irq_enable); 56 PATCH_SITE(pv_irq_ops, irq_enable);
57 PATCH_SITE(pv_irq_ops, irq_disable); 57 PATCH_SITE(pv_irq_ops, irq_disable);
58 PATCH_SITE(pv_cpu_ops, irq_enable_sysexit);
59 PATCH_SITE(pv_cpu_ops, usergs_sysret32); 58 PATCH_SITE(pv_cpu_ops, usergs_sysret32);
60 PATCH_SITE(pv_cpu_ops, usergs_sysret64); 59 PATCH_SITE(pv_cpu_ops, usergs_sysret64);
61 PATCH_SITE(pv_cpu_ops, swapgs); 60 PATCH_SITE(pv_cpu_ops, swapgs);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index deff651835b4..c09c99ccf3e3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -303,13 +303,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
303 arch_end_context_switch(next_p); 303 arch_end_context_switch(next_p);
304 304
305 /* 305 /*
306 * Reload esp0, kernel_stack, and current_top_of_stack. This changes 306 * Reload esp0 and cpu_current_top_of_stack. This changes
307 * current_thread_info(). 307 * current_thread_info().
308 */ 308 */
309 load_sp0(tss, next); 309 load_sp0(tss, next);
310 this_cpu_write(kernel_stack,
311 (unsigned long)task_stack_page(next_p) +
312 THREAD_SIZE);
313 this_cpu_write(cpu_current_top_of_stack, 310 this_cpu_write(cpu_current_top_of_stack,
314 (unsigned long)task_stack_page(next_p) + 311 (unsigned long)task_stack_page(next_p) +
315 THREAD_SIZE); 312 THREAD_SIZE);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index c50e013b57d2..843f92e4c711 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -410,9 +410,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
410 /* Reload esp0 and ss1. This changes current_thread_info(). */ 410 /* Reload esp0 and ss1. This changes current_thread_info(). */
411 load_sp0(tss, next); 411 load_sp0(tss, next);
412 412
413 this_cpu_write(kernel_stack,
414 (unsigned long)task_stack_page(next_p) + THREAD_SIZE);
415
416 /* 413 /*
417 * Now maybe reload the debug registers and handle I/O bitmaps 414 * Now maybe reload the debug registers and handle I/O bitmaps
418 */ 415 */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index cba828892790..265a6fdea8b7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1224,8 +1224,7 @@ void __init setup_arch(char **cmdline_p)
1224 init_cpu_to_node(); 1224 init_cpu_to_node();
1225 1225
1226 init_apic_mappings(); 1226 init_apic_mappings();
1227 if (x86_io_apic_ops.init) 1227 io_apic_init_mappings();
1228 x86_io_apic_ops.init();
1229 1228
1230 kvm_guest_init(); 1229 kvm_guest_init();
1231 1230
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index be8e1bde07aa..15aaa69bbb5e 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -170,8 +170,7 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
170 170
171asmlinkage __visible void smp_reboot_interrupt(void) 171asmlinkage __visible void smp_reboot_interrupt(void)
172{ 172{
173 ack_APIC_irq(); 173 ipi_entering_ack_irq();
174 irq_enter();
175 stop_this_cpu(NULL); 174 stop_this_cpu(NULL);
176 irq_exit(); 175 irq_exit();
177} 176}
@@ -265,12 +264,6 @@ __visible void smp_reschedule_interrupt(struct pt_regs *regs)
265 */ 264 */
266} 265}
267 266
268static inline void smp_entering_irq(void)
269{
270 ack_APIC_irq();
271 irq_enter();
272}
273
274__visible void smp_trace_reschedule_interrupt(struct pt_regs *regs) 267__visible void smp_trace_reschedule_interrupt(struct pt_regs *regs)
275{ 268{
276 /* 269 /*
@@ -279,7 +272,7 @@ __visible void smp_trace_reschedule_interrupt(struct pt_regs *regs)
279 * scheduler_ipi(). This is OK, since those functions are allowed 272 * scheduler_ipi(). This is OK, since those functions are allowed
280 * to nest. 273 * to nest.
281 */ 274 */
282 smp_entering_irq(); 275 ipi_entering_ack_irq();
283 trace_reschedule_entry(RESCHEDULE_VECTOR); 276 trace_reschedule_entry(RESCHEDULE_VECTOR);
284 __smp_reschedule_interrupt(); 277 __smp_reschedule_interrupt();
285 trace_reschedule_exit(RESCHEDULE_VECTOR); 278 trace_reschedule_exit(RESCHEDULE_VECTOR);
@@ -297,14 +290,14 @@ static inline void __smp_call_function_interrupt(void)
297 290
298__visible void smp_call_function_interrupt(struct pt_regs *regs) 291__visible void smp_call_function_interrupt(struct pt_regs *regs)
299{ 292{
300 smp_entering_irq(); 293 ipi_entering_ack_irq();
301 __smp_call_function_interrupt(); 294 __smp_call_function_interrupt();
302 exiting_irq(); 295 exiting_irq();
303} 296}
304 297
305__visible void smp_trace_call_function_interrupt(struct pt_regs *regs) 298__visible void smp_trace_call_function_interrupt(struct pt_regs *regs)
306{ 299{
307 smp_entering_irq(); 300 ipi_entering_ack_irq();
308 trace_call_function_entry(CALL_FUNCTION_VECTOR); 301 trace_call_function_entry(CALL_FUNCTION_VECTOR);
309 __smp_call_function_interrupt(); 302 __smp_call_function_interrupt();
310 trace_call_function_exit(CALL_FUNCTION_VECTOR); 303 trace_call_function_exit(CALL_FUNCTION_VECTOR);
@@ -319,14 +312,14 @@ static inline void __smp_call_function_single_interrupt(void)
319 312
320__visible void smp_call_function_single_interrupt(struct pt_regs *regs) 313__visible void smp_call_function_single_interrupt(struct pt_regs *regs)
321{ 314{
322 smp_entering_irq(); 315 ipi_entering_ack_irq();
323 __smp_call_function_single_interrupt(); 316 __smp_call_function_single_interrupt();
324 exiting_irq(); 317 exiting_irq();
325} 318}
326 319
327__visible void smp_trace_call_function_single_interrupt(struct pt_regs *regs) 320__visible void smp_trace_call_function_single_interrupt(struct pt_regs *regs)
328{ 321{
329 smp_entering_irq(); 322 ipi_entering_ack_irq();
330 trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR); 323 trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
331 __smp_call_function_single_interrupt(); 324 __smp_call_function_single_interrupt();
332 trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR); 325 trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6d4bfea25874..8add66b22f33 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -515,6 +515,40 @@ void __inquire_remote_apic(int apicid)
515} 515}
516 516
517/* 517/*
518 * The Multiprocessor Specification 1.4 (1997) example code suggests
519 * that there should be a 10ms delay between the BSP asserting INIT
520 * and de-asserting INIT, when starting a remote processor.
521 * But that slows boot and resume on modern processors, which include
522 * many cores and don't require that delay.
523 *
524 * Cmdline "init_cpu_udelay=" is available to over-ride this delay.
525 * Modern processor families are quirked to remove the delay entirely.
526 */
527#define UDELAY_10MS_DEFAULT 10000
528
529static unsigned int init_udelay = UDELAY_10MS_DEFAULT;
530
531static int __init cpu_init_udelay(char *str)
532{
533 get_option(&str, &init_udelay);
534
535 return 0;
536}
537early_param("cpu_init_udelay", cpu_init_udelay);
538
539static void __init smp_quirk_init_udelay(void)
540{
541 /* if cmdline changed it from default, leave it alone */
542 if (init_udelay != UDELAY_10MS_DEFAULT)
543 return;
544
545 /* if modern processor, use no delay */
546 if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && (boot_cpu_data.x86 == 6)) ||
547 ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0xF)))
548 init_udelay = 0;
549}
550
551/*
518 * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal 552 * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
519 * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this 553 * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
520 * won't ... remember to clear down the APIC, etc later. 554 * won't ... remember to clear down the APIC, etc later.
@@ -556,7 +590,7 @@ wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
556static int 590static int
557wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) 591wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
558{ 592{
559 unsigned long send_status, accept_status = 0; 593 unsigned long send_status = 0, accept_status = 0;
560 int maxlvt, num_starts, j; 594 int maxlvt, num_starts, j;
561 595
562 maxlvt = lapic_get_maxlvt(); 596 maxlvt = lapic_get_maxlvt();
@@ -584,7 +618,7 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
584 pr_debug("Waiting for send to finish...\n"); 618 pr_debug("Waiting for send to finish...\n");
585 send_status = safe_apic_wait_icr_idle(); 619 send_status = safe_apic_wait_icr_idle();
586 620
587 mdelay(10); 621 udelay(init_udelay);
588 622
589 pr_debug("Deasserting INIT\n"); 623 pr_debug("Deasserting INIT\n");
590 624
@@ -652,6 +686,7 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
652 * Give the other CPU some time to accept the IPI. 686 * Give the other CPU some time to accept the IPI.
653 */ 687 */
654 udelay(200); 688 udelay(200);
689
655 if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */ 690 if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */
656 apic_write(APIC_ESR, 0); 691 apic_write(APIC_ESR, 0);
657 accept_status = (apic_read(APIC_ESR) & 0xEF); 692 accept_status = (apic_read(APIC_ESR) & 0xEF);
@@ -793,8 +828,6 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
793 clear_tsk_thread_flag(idle, TIF_FORK); 828 clear_tsk_thread_flag(idle, TIF_FORK);
794 initial_gs = per_cpu_offset(cpu); 829 initial_gs = per_cpu_offset(cpu);
795#endif 830#endif
796 per_cpu(kernel_stack, cpu) =
797 (unsigned long)task_stack_page(idle) + THREAD_SIZE;
798} 831}
799 832
800/* 833/*
@@ -1177,6 +1210,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
1177 uv_system_init(); 1210 uv_system_init();
1178 1211
1179 set_mtrr_aps_delayed_init(); 1212 set_mtrr_aps_delayed_init();
1213
1214 smp_quirk_init_udelay();
1180} 1215}
1181 1216
1182void arch_enable_nonboot_cpus_begin(void) 1217void arch_enable_nonboot_cpus_begin(void)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 36cb15b7b367..f5791927aa64 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -73,8 +73,7 @@ gate_desc debug_idt_table[NR_VECTORS] __page_aligned_bss;
73#else 73#else
74#include <asm/processor-flags.h> 74#include <asm/processor-flags.h>
75#include <asm/setup.h> 75#include <asm/setup.h>
76 76#include <asm/proto.h>
77asmlinkage int system_call(void);
78#endif 77#endif
79 78
80/* Must be page-aligned because the real IDT is used in a fixmap. */ 79/* Must be page-aligned because the real IDT is used in a fixmap. */
@@ -769,18 +768,6 @@ dotraplinkage void
769do_spurious_interrupt_bug(struct pt_regs *regs, long error_code) 768do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
770{ 769{
771 conditional_sti(regs); 770 conditional_sti(regs);
772#if 0
773 /* No need to warn about this any longer. */
774 pr_info("Ignoring P6 Local APIC Spurious Interrupt Bug...\n");
775#endif
776}
777
778asmlinkage __visible void __attribute__((weak)) smp_thermal_interrupt(void)
779{
780}
781
782asmlinkage __visible void __attribute__((weak)) smp_threshold_interrupt(void)
783{
784} 771}
785 772
786dotraplinkage void 773dotraplinkage void
@@ -906,13 +893,13 @@ void __init trap_init(void)
906 set_bit(i, used_vectors); 893 set_bit(i, used_vectors);
907 894
908#ifdef CONFIG_IA32_EMULATION 895#ifdef CONFIG_IA32_EMULATION
909 set_system_intr_gate(IA32_SYSCALL_VECTOR, ia32_syscall); 896 set_system_intr_gate(IA32_SYSCALL_VECTOR, entry_INT80_compat);
910 set_bit(IA32_SYSCALL_VECTOR, used_vectors); 897 set_bit(IA32_SYSCALL_VECTOR, used_vectors);
911#endif 898#endif
912 899
913#ifdef CONFIG_X86_32 900#ifdef CONFIG_X86_32
914 set_system_trap_gate(SYSCALL_VECTOR, &system_call); 901 set_system_trap_gate(IA32_SYSCALL_VECTOR, entry_INT80_32);
915 set_bit(SYSCALL_VECTOR, used_vectors); 902 set_bit(IA32_SYSCALL_VECTOR, used_vectors);
916#endif 903#endif
917 904
918 /* 905 /*
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 234b0722de53..3cee10abf01d 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -111,11 +111,9 @@ EXPORT_SYMBOL_GPL(x86_platform);
111#if defined(CONFIG_PCI_MSI) 111#if defined(CONFIG_PCI_MSI)
112struct x86_msi_ops x86_msi = { 112struct x86_msi_ops x86_msi = {
113 .setup_msi_irqs = native_setup_msi_irqs, 113 .setup_msi_irqs = native_setup_msi_irqs,
114 .compose_msi_msg = native_compose_msi_msg,
115 .teardown_msi_irq = native_teardown_msi_irq, 114 .teardown_msi_irq = native_teardown_msi_irq,
116 .teardown_msi_irqs = default_teardown_msi_irqs, 115 .teardown_msi_irqs = default_teardown_msi_irqs,
117 .restore_msi_irqs = default_restore_msi_irqs, 116 .restore_msi_irqs = default_restore_msi_irqs,
118 .setup_hpet_msi = default_setup_hpet_msi,
119}; 117};
120 118
121/* MSI arch specific hooks */ 119/* MSI arch specific hooks */
@@ -141,13 +139,6 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
141#endif 139#endif
142 140
143struct x86_io_apic_ops x86_io_apic_ops = { 141struct x86_io_apic_ops x86_io_apic_ops = {
144 .init = native_io_apic_init_mappings,
145 .read = native_io_apic_read, 142 .read = native_io_apic_read,
146 .write = native_io_apic_write,
147 .modify = native_io_apic_modify,
148 .disable = native_disable_io_apic, 143 .disable = native_disable_io_apic,
149 .print_entries = native_io_apic_print_entries,
150 .set_affinity = native_ioapic_set_affinity,
151 .setup_entry = native_setup_ioapic_entry,
152 .eoi_ioapic_pin = native_eoi_ioapic_pin,
153}; 144};
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 27f8eea0d6eb..f2dc08c003eb 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -90,7 +90,7 @@ struct lguest_data lguest_data = {
90 .noirq_iret = (u32)lguest_noirq_iret, 90 .noirq_iret = (u32)lguest_noirq_iret,
91 .kernel_address = PAGE_OFFSET, 91 .kernel_address = PAGE_OFFSET,
92 .blocked_interrupts = { 1 }, /* Block timer interrupts */ 92 .blocked_interrupts = { 1 }, /* Block timer interrupts */
93 .syscall_vec = SYSCALL_VECTOR, 93 .syscall_vec = IA32_SYSCALL_VECTOR,
94}; 94};
95 95
96/*G:037 96/*G:037
@@ -866,7 +866,7 @@ static void __init lguest_init_IRQ(void)
866 for (i = FIRST_EXTERNAL_VECTOR; i < FIRST_SYSTEM_VECTOR; i++) { 866 for (i = FIRST_EXTERNAL_VECTOR; i < FIRST_SYSTEM_VECTOR; i++) {
867 /* Some systems map "vectors" to interrupts weirdly. Not us! */ 867 /* Some systems map "vectors" to interrupts weirdly. Not us! */
868 __this_cpu_write(vector_irq[i], i - FIRST_EXTERNAL_VECTOR); 868 __this_cpu_write(vector_irq[i], i - FIRST_EXTERNAL_VECTOR);
869 if (i != SYSCALL_VECTOR) 869 if (i != IA32_SYSCALL_VECTOR)
870 set_intr_gate(i, irq_entries_start + 870 set_intr_gate(i, irq_entries_start +
871 8 * (i - FIRST_EXTERNAL_VECTOR)); 871 8 * (i - FIRST_EXTERNAL_VECTOR));
872 } 872 }
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 1530afb07c85..f2587888d987 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -17,7 +17,6 @@ clean-files := inat-tables.c
17obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o 17obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
18 18
19lib-y := delay.o misc.o cmdline.o 19lib-y := delay.o misc.o cmdline.o
20lib-y += thunk_$(BITS).o
21lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o 20lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
22lib-y += memcpy_$(BITS).o 21lib-y += memcpy_$(BITS).o
23lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o 22lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
@@ -40,6 +39,6 @@ else
40 lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o 39 lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
41 lib-y += clear_page_64.o copy_page_64.o 40 lib-y += clear_page_64.o copy_page_64.o
42 lib-y += memmove_64.o memset_64.o 41 lib-y += memmove_64.o memset_64.o
43 lib-y += copy_user_64.o copy_user_nocache_64.o 42 lib-y += copy_user_64.o
44 lib-y += cmpxchg16b_emu.o 43 lib-y += cmpxchg16b_emu.o
45endif 44endif
diff --git a/arch/x86/lib/atomic64_386_32.S b/arch/x86/lib/atomic64_386_32.S
index 00933d5e992f..9b0ca8fe80fc 100644
--- a/arch/x86/lib/atomic64_386_32.S
+++ b/arch/x86/lib/atomic64_386_32.S
@@ -11,26 +11,23 @@
11 11
12#include <linux/linkage.h> 12#include <linux/linkage.h>
13#include <asm/alternative-asm.h> 13#include <asm/alternative-asm.h>
14#include <asm/dwarf2.h>
15 14
16/* if you want SMP support, implement these with real spinlocks */ 15/* if you want SMP support, implement these with real spinlocks */
17.macro LOCK reg 16.macro LOCK reg
18 pushfl_cfi 17 pushfl
19 cli 18 cli
20.endm 19.endm
21 20
22.macro UNLOCK reg 21.macro UNLOCK reg
23 popfl_cfi 22 popfl
24.endm 23.endm
25 24
26#define BEGIN(op) \ 25#define BEGIN(op) \
27.macro endp; \ 26.macro endp; \
28 CFI_ENDPROC; \
29ENDPROC(atomic64_##op##_386); \ 27ENDPROC(atomic64_##op##_386); \
30.purgem endp; \ 28.purgem endp; \
31.endm; \ 29.endm; \
32ENTRY(atomic64_##op##_386); \ 30ENTRY(atomic64_##op##_386); \
33 CFI_STARTPROC; \
34 LOCK v; 31 LOCK v;
35 32
36#define ENDP endp 33#define ENDP endp
diff --git a/arch/x86/lib/atomic64_cx8_32.S b/arch/x86/lib/atomic64_cx8_32.S
index 082a85167a5b..db3ae85440ff 100644
--- a/arch/x86/lib/atomic64_cx8_32.S
+++ b/arch/x86/lib/atomic64_cx8_32.S
@@ -11,7 +11,6 @@
11 11
12#include <linux/linkage.h> 12#include <linux/linkage.h>
13#include <asm/alternative-asm.h> 13#include <asm/alternative-asm.h>
14#include <asm/dwarf2.h>
15 14
16.macro read64 reg 15.macro read64 reg
17 movl %ebx, %eax 16 movl %ebx, %eax
@@ -22,16 +21,11 @@
22.endm 21.endm
23 22
24ENTRY(atomic64_read_cx8) 23ENTRY(atomic64_read_cx8)
25 CFI_STARTPROC
26
27 read64 %ecx 24 read64 %ecx
28 ret 25 ret
29 CFI_ENDPROC
30ENDPROC(atomic64_read_cx8) 26ENDPROC(atomic64_read_cx8)
31 27
32ENTRY(atomic64_set_cx8) 28ENTRY(atomic64_set_cx8)
33 CFI_STARTPROC
34
351: 291:
36/* we don't need LOCK_PREFIX since aligned 64-bit writes 30/* we don't need LOCK_PREFIX since aligned 64-bit writes
37 * are atomic on 586 and newer */ 31 * are atomic on 586 and newer */
@@ -39,28 +33,23 @@ ENTRY(atomic64_set_cx8)
39 jne 1b 33 jne 1b
40 34
41 ret 35 ret
42 CFI_ENDPROC
43ENDPROC(atomic64_set_cx8) 36ENDPROC(atomic64_set_cx8)
44 37
45ENTRY(atomic64_xchg_cx8) 38ENTRY(atomic64_xchg_cx8)
46 CFI_STARTPROC
47
481: 391:
49 LOCK_PREFIX 40 LOCK_PREFIX
50 cmpxchg8b (%esi) 41 cmpxchg8b (%esi)
51 jne 1b 42 jne 1b
52 43
53 ret 44 ret
54 CFI_ENDPROC
55ENDPROC(atomic64_xchg_cx8) 45ENDPROC(atomic64_xchg_cx8)
56 46
57.macro addsub_return func ins insc 47.macro addsub_return func ins insc
58ENTRY(atomic64_\func\()_return_cx8) 48ENTRY(atomic64_\func\()_return_cx8)
59 CFI_STARTPROC 49 pushl %ebp
60 pushl_cfi_reg ebp 50 pushl %ebx
61 pushl_cfi_reg ebx 51 pushl %esi
62 pushl_cfi_reg esi 52 pushl %edi
63 pushl_cfi_reg edi
64 53
65 movl %eax, %esi 54 movl %eax, %esi
66 movl %edx, %edi 55 movl %edx, %edi
@@ -79,12 +68,11 @@ ENTRY(atomic64_\func\()_return_cx8)
7910: 6810:
80 movl %ebx, %eax 69 movl %ebx, %eax
81 movl %ecx, %edx 70 movl %ecx, %edx
82 popl_cfi_reg edi 71 popl %edi
83 popl_cfi_reg esi 72 popl %esi
84 popl_cfi_reg ebx 73 popl %ebx
85 popl_cfi_reg ebp 74 popl %ebp
86 ret 75 ret
87 CFI_ENDPROC
88ENDPROC(atomic64_\func\()_return_cx8) 76ENDPROC(atomic64_\func\()_return_cx8)
89.endm 77.endm
90 78
@@ -93,8 +81,7 @@ addsub_return sub sub sbb
93 81
94.macro incdec_return func ins insc 82.macro incdec_return func ins insc
95ENTRY(atomic64_\func\()_return_cx8) 83ENTRY(atomic64_\func\()_return_cx8)
96 CFI_STARTPROC 84 pushl %ebx
97 pushl_cfi_reg ebx
98 85
99 read64 %esi 86 read64 %esi
1001: 871:
@@ -109,9 +96,8 @@ ENTRY(atomic64_\func\()_return_cx8)
10910: 9610:
110 movl %ebx, %eax 97 movl %ebx, %eax
111 movl %ecx, %edx 98 movl %ecx, %edx
112 popl_cfi_reg ebx 99 popl %ebx
113 ret 100 ret
114 CFI_ENDPROC
115ENDPROC(atomic64_\func\()_return_cx8) 101ENDPROC(atomic64_\func\()_return_cx8)
116.endm 102.endm
117 103
@@ -119,8 +105,7 @@ incdec_return inc add adc
119incdec_return dec sub sbb 105incdec_return dec sub sbb
120 106
121ENTRY(atomic64_dec_if_positive_cx8) 107ENTRY(atomic64_dec_if_positive_cx8)
122 CFI_STARTPROC 108 pushl %ebx
123 pushl_cfi_reg ebx
124 109
125 read64 %esi 110 read64 %esi
1261: 1111:
@@ -136,18 +121,16 @@ ENTRY(atomic64_dec_if_positive_cx8)
1362: 1212:
137 movl %ebx, %eax 122 movl %ebx, %eax
138 movl %ecx, %edx 123 movl %ecx, %edx
139 popl_cfi_reg ebx 124 popl %ebx
140 ret 125 ret
141 CFI_ENDPROC
142ENDPROC(atomic64_dec_if_positive_cx8) 126ENDPROC(atomic64_dec_if_positive_cx8)
143 127
144ENTRY(atomic64_add_unless_cx8) 128ENTRY(atomic64_add_unless_cx8)
145 CFI_STARTPROC 129 pushl %ebp
146 pushl_cfi_reg ebp 130 pushl %ebx
147 pushl_cfi_reg ebx
148/* these just push these two parameters on the stack */ 131/* these just push these two parameters on the stack */
149 pushl_cfi_reg edi 132 pushl %edi
150 pushl_cfi_reg ecx 133 pushl %ecx
151 134
152 movl %eax, %ebp 135 movl %eax, %ebp
153 movl %edx, %edi 136 movl %edx, %edi
@@ -168,21 +151,18 @@ ENTRY(atomic64_add_unless_cx8)
168 movl $1, %eax 151 movl $1, %eax
1693: 1523:
170 addl $8, %esp 153 addl $8, %esp
171 CFI_ADJUST_CFA_OFFSET -8 154 popl %ebx
172 popl_cfi_reg ebx 155 popl %ebp
173 popl_cfi_reg ebp
174 ret 156 ret
1754: 1574:
176 cmpl %edx, 4(%esp) 158 cmpl %edx, 4(%esp)
177 jne 2b 159 jne 2b
178 xorl %eax, %eax 160 xorl %eax, %eax
179 jmp 3b 161 jmp 3b
180 CFI_ENDPROC
181ENDPROC(atomic64_add_unless_cx8) 162ENDPROC(atomic64_add_unless_cx8)
182 163
183ENTRY(atomic64_inc_not_zero_cx8) 164ENTRY(atomic64_inc_not_zero_cx8)
184 CFI_STARTPROC 165 pushl %ebx
185 pushl_cfi_reg ebx
186 166
187 read64 %esi 167 read64 %esi
1881: 1681:
@@ -199,7 +179,6 @@ ENTRY(atomic64_inc_not_zero_cx8)
199 179
200 movl $1, %eax 180 movl $1, %eax
2013: 1813:
202 popl_cfi_reg ebx 182 popl %ebx
203 ret 183 ret
204 CFI_ENDPROC
205ENDPROC(atomic64_inc_not_zero_cx8) 184ENDPROC(atomic64_inc_not_zero_cx8)
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 9bc944a91274..c1e623209853 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -26,7 +26,6 @@
26 */ 26 */
27 27
28#include <linux/linkage.h> 28#include <linux/linkage.h>
29#include <asm/dwarf2.h>
30#include <asm/errno.h> 29#include <asm/errno.h>
31#include <asm/asm.h> 30#include <asm/asm.h>
32 31
@@ -50,9 +49,8 @@ unsigned int csum_partial(const unsigned char * buff, int len, unsigned int sum)
50 * alignment for the unrolled loop. 49 * alignment for the unrolled loop.
51 */ 50 */
52ENTRY(csum_partial) 51ENTRY(csum_partial)
53 CFI_STARTPROC 52 pushl %esi
54 pushl_cfi_reg esi 53 pushl %ebx
55 pushl_cfi_reg ebx
56 movl 20(%esp),%eax # Function arg: unsigned int sum 54 movl 20(%esp),%eax # Function arg: unsigned int sum
57 movl 16(%esp),%ecx # Function arg: int len 55 movl 16(%esp),%ecx # Function arg: int len
58 movl 12(%esp),%esi # Function arg: unsigned char *buff 56 movl 12(%esp),%esi # Function arg: unsigned char *buff
@@ -129,10 +127,9 @@ ENTRY(csum_partial)
129 jz 8f 127 jz 8f
130 roll $8, %eax 128 roll $8, %eax
1318: 1298:
132 popl_cfi_reg ebx 130 popl %ebx
133 popl_cfi_reg esi 131 popl %esi
134 ret 132 ret
135 CFI_ENDPROC
136ENDPROC(csum_partial) 133ENDPROC(csum_partial)
137 134
138#else 135#else
@@ -140,9 +137,8 @@ ENDPROC(csum_partial)
140/* Version for PentiumII/PPro */ 137/* Version for PentiumII/PPro */
141 138
142ENTRY(csum_partial) 139ENTRY(csum_partial)
143 CFI_STARTPROC 140 pushl %esi
144 pushl_cfi_reg esi 141 pushl %ebx
145 pushl_cfi_reg ebx
146 movl 20(%esp),%eax # Function arg: unsigned int sum 142 movl 20(%esp),%eax # Function arg: unsigned int sum
147 movl 16(%esp),%ecx # Function arg: int len 143 movl 16(%esp),%ecx # Function arg: int len
148 movl 12(%esp),%esi # Function arg: const unsigned char *buf 144 movl 12(%esp),%esi # Function arg: const unsigned char *buf
@@ -249,10 +245,9 @@ ENTRY(csum_partial)
249 jz 90f 245 jz 90f
250 roll $8, %eax 246 roll $8, %eax
25190: 24790:
252 popl_cfi_reg ebx 248 popl %ebx
253 popl_cfi_reg esi 249 popl %esi
254 ret 250 ret
255 CFI_ENDPROC
256ENDPROC(csum_partial) 251ENDPROC(csum_partial)
257 252
258#endif 253#endif
@@ -287,12 +282,10 @@ unsigned int csum_partial_copy_generic (const char *src, char *dst,
287#define FP 12 282#define FP 12
288 283
289ENTRY(csum_partial_copy_generic) 284ENTRY(csum_partial_copy_generic)
290 CFI_STARTPROC
291 subl $4,%esp 285 subl $4,%esp
292 CFI_ADJUST_CFA_OFFSET 4 286 pushl %edi
293 pushl_cfi_reg edi 287 pushl %esi
294 pushl_cfi_reg esi 288 pushl %ebx
295 pushl_cfi_reg ebx
296 movl ARGBASE+16(%esp),%eax # sum 289 movl ARGBASE+16(%esp),%eax # sum
297 movl ARGBASE+12(%esp),%ecx # len 290 movl ARGBASE+12(%esp),%ecx # len
298 movl ARGBASE+4(%esp),%esi # src 291 movl ARGBASE+4(%esp),%esi # src
@@ -401,12 +394,11 @@ DST( movb %cl, (%edi) )
401 394
402.previous 395.previous
403 396
404 popl_cfi_reg ebx 397 popl %ebx
405 popl_cfi_reg esi 398 popl %esi
406 popl_cfi_reg edi 399 popl %edi
407 popl_cfi %ecx # equivalent to addl $4,%esp 400 popl %ecx # equivalent to addl $4,%esp
408 ret 401 ret
409 CFI_ENDPROC
410ENDPROC(csum_partial_copy_generic) 402ENDPROC(csum_partial_copy_generic)
411 403
412#else 404#else
@@ -426,10 +418,9 @@ ENDPROC(csum_partial_copy_generic)
426#define ARGBASE 12 418#define ARGBASE 12
427 419
428ENTRY(csum_partial_copy_generic) 420ENTRY(csum_partial_copy_generic)
429 CFI_STARTPROC 421 pushl %ebx
430 pushl_cfi_reg ebx 422 pushl %edi
431 pushl_cfi_reg edi 423 pushl %esi
432 pushl_cfi_reg esi
433 movl ARGBASE+4(%esp),%esi #src 424 movl ARGBASE+4(%esp),%esi #src
434 movl ARGBASE+8(%esp),%edi #dst 425 movl ARGBASE+8(%esp),%edi #dst
435 movl ARGBASE+12(%esp),%ecx #len 426 movl ARGBASE+12(%esp),%ecx #len
@@ -489,11 +480,10 @@ DST( movb %dl, (%edi) )
489 jmp 7b 480 jmp 7b
490.previous 481.previous
491 482
492 popl_cfi_reg esi 483 popl %esi
493 popl_cfi_reg edi 484 popl %edi
494 popl_cfi_reg ebx 485 popl %ebx
495 ret 486 ret
496 CFI_ENDPROC
497ENDPROC(csum_partial_copy_generic) 487ENDPROC(csum_partial_copy_generic)
498 488
499#undef ROUND 489#undef ROUND
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index e67e579c93bd..a2fe51b00cce 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -1,5 +1,4 @@
1#include <linux/linkage.h> 1#include <linux/linkage.h>
2#include <asm/dwarf2.h>
3#include <asm/cpufeature.h> 2#include <asm/cpufeature.h>
4#include <asm/alternative-asm.h> 3#include <asm/alternative-asm.h>
5 4
@@ -15,7 +14,6 @@
15 * %rdi - page 14 * %rdi - page
16 */ 15 */
17ENTRY(clear_page) 16ENTRY(clear_page)
18 CFI_STARTPROC
19 17
20 ALTERNATIVE_2 "jmp clear_page_orig", "", X86_FEATURE_REP_GOOD, \ 18 ALTERNATIVE_2 "jmp clear_page_orig", "", X86_FEATURE_REP_GOOD, \
21 "jmp clear_page_c_e", X86_FEATURE_ERMS 19 "jmp clear_page_c_e", X86_FEATURE_ERMS
@@ -24,11 +22,9 @@ ENTRY(clear_page)
24 xorl %eax,%eax 22 xorl %eax,%eax
25 rep stosq 23 rep stosq
26 ret 24 ret
27 CFI_ENDPROC
28ENDPROC(clear_page) 25ENDPROC(clear_page)
29 26
30ENTRY(clear_page_orig) 27ENTRY(clear_page_orig)
31 CFI_STARTPROC
32 28
33 xorl %eax,%eax 29 xorl %eax,%eax
34 movl $4096/64,%ecx 30 movl $4096/64,%ecx
@@ -48,14 +44,11 @@ ENTRY(clear_page_orig)
48 jnz .Lloop 44 jnz .Lloop
49 nop 45 nop
50 ret 46 ret
51 CFI_ENDPROC
52ENDPROC(clear_page_orig) 47ENDPROC(clear_page_orig)
53 48
54ENTRY(clear_page_c_e) 49ENTRY(clear_page_c_e)
55 CFI_STARTPROC
56 movl $4096,%ecx 50 movl $4096,%ecx
57 xorl %eax,%eax 51 xorl %eax,%eax
58 rep stosb 52 rep stosb
59 ret 53 ret
60 CFI_ENDPROC
61ENDPROC(clear_page_c_e) 54ENDPROC(clear_page_c_e)
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 40a172541ee2..9b330242e740 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -6,7 +6,6 @@
6 * 6 *
7 */ 7 */
8#include <linux/linkage.h> 8#include <linux/linkage.h>
9#include <asm/dwarf2.h>
10#include <asm/percpu.h> 9#include <asm/percpu.h>
11 10
12.text 11.text
@@ -21,7 +20,6 @@
21 * %al : Operation successful 20 * %al : Operation successful
22 */ 21 */
23ENTRY(this_cpu_cmpxchg16b_emu) 22ENTRY(this_cpu_cmpxchg16b_emu)
24CFI_STARTPROC
25 23
26# 24#
27# Emulate 'cmpxchg16b %gs:(%rsi)' except we return the result in %al not 25# Emulate 'cmpxchg16b %gs:(%rsi)' except we return the result in %al not
@@ -32,7 +30,7 @@ CFI_STARTPROC
32# *atomic* on a single cpu (as provided by the this_cpu_xx class of 30# *atomic* on a single cpu (as provided by the this_cpu_xx class of
33# macros). 31# macros).
34# 32#
35 pushfq_cfi 33 pushfq
36 cli 34 cli
37 35
38 cmpq PER_CPU_VAR((%rsi)), %rax 36 cmpq PER_CPU_VAR((%rsi)), %rax
@@ -43,17 +41,13 @@ CFI_STARTPROC
43 movq %rbx, PER_CPU_VAR((%rsi)) 41 movq %rbx, PER_CPU_VAR((%rsi))
44 movq %rcx, PER_CPU_VAR(8(%rsi)) 42 movq %rcx, PER_CPU_VAR(8(%rsi))
45 43
46 CFI_REMEMBER_STATE 44 popfq
47 popfq_cfi
48 mov $1, %al 45 mov $1, %al
49 ret 46 ret
50 47
51 CFI_RESTORE_STATE
52.Lnot_same: 48.Lnot_same:
53 popfq_cfi 49 popfq
54 xor %al,%al 50 xor %al,%al
55 ret 51 ret
56 52
57CFI_ENDPROC
58
59ENDPROC(this_cpu_cmpxchg16b_emu) 53ENDPROC(this_cpu_cmpxchg16b_emu)
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
index b4807fce5177..ad5349778490 100644
--- a/arch/x86/lib/cmpxchg8b_emu.S
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -7,7 +7,6 @@
7 */ 7 */
8 8
9#include <linux/linkage.h> 9#include <linux/linkage.h>
10#include <asm/dwarf2.h>
11 10
12.text 11.text
13 12
@@ -20,14 +19,13 @@
20 * %ecx : high 32 bits of new value 19 * %ecx : high 32 bits of new value
21 */ 20 */
22ENTRY(cmpxchg8b_emu) 21ENTRY(cmpxchg8b_emu)
23CFI_STARTPROC
24 22
25# 23#
26# Emulate 'cmpxchg8b (%esi)' on UP except we don't 24# Emulate 'cmpxchg8b (%esi)' on UP except we don't
27# set the whole ZF thing (caller will just compare 25# set the whole ZF thing (caller will just compare
28# eax:edx with the expected value) 26# eax:edx with the expected value)
29# 27#
30 pushfl_cfi 28 pushfl
31 cli 29 cli
32 30
33 cmpl (%esi), %eax 31 cmpl (%esi), %eax
@@ -38,18 +36,15 @@ CFI_STARTPROC
38 movl %ebx, (%esi) 36 movl %ebx, (%esi)
39 movl %ecx, 4(%esi) 37 movl %ecx, 4(%esi)
40 38
41 CFI_REMEMBER_STATE 39 popfl
42 popfl_cfi
43 ret 40 ret
44 41
45 CFI_RESTORE_STATE
46.Lnot_same: 42.Lnot_same:
47 movl (%esi), %eax 43 movl (%esi), %eax
48.Lhalf_same: 44.Lhalf_same:
49 movl 4(%esi), %edx 45 movl 4(%esi), %edx
50 46
51 popfl_cfi 47 popfl
52 ret 48 ret
53 49
54CFI_ENDPROC
55ENDPROC(cmpxchg8b_emu) 50ENDPROC(cmpxchg8b_emu)
diff --git a/arch/x86/lib/copy_page_64.S b/arch/x86/lib/copy_page_64.S
index 8239dbcbf984..009f98216b7e 100644
--- a/arch/x86/lib/copy_page_64.S
+++ b/arch/x86/lib/copy_page_64.S
@@ -1,7 +1,6 @@
1/* Written 2003 by Andi Kleen, based on a kernel by Evandro Menezes */ 1/* Written 2003 by Andi Kleen, based on a kernel by Evandro Menezes */
2 2
3#include <linux/linkage.h> 3#include <linux/linkage.h>
4#include <asm/dwarf2.h>
5#include <asm/cpufeature.h> 4#include <asm/cpufeature.h>
6#include <asm/alternative-asm.h> 5#include <asm/alternative-asm.h>
7 6
@@ -13,22 +12,16 @@
13 */ 12 */
14 ALIGN 13 ALIGN
15ENTRY(copy_page) 14ENTRY(copy_page)
16 CFI_STARTPROC
17 ALTERNATIVE "jmp copy_page_regs", "", X86_FEATURE_REP_GOOD 15 ALTERNATIVE "jmp copy_page_regs", "", X86_FEATURE_REP_GOOD
18 movl $4096/8, %ecx 16 movl $4096/8, %ecx
19 rep movsq 17 rep movsq
20 ret 18 ret
21 CFI_ENDPROC
22ENDPROC(copy_page) 19ENDPROC(copy_page)
23 20
24ENTRY(copy_page_regs) 21ENTRY(copy_page_regs)
25 CFI_STARTPROC
26 subq $2*8, %rsp 22 subq $2*8, %rsp
27 CFI_ADJUST_CFA_OFFSET 2*8
28 movq %rbx, (%rsp) 23 movq %rbx, (%rsp)
29 CFI_REL_OFFSET rbx, 0
30 movq %r12, 1*8(%rsp) 24 movq %r12, 1*8(%rsp)
31 CFI_REL_OFFSET r12, 1*8
32 25
33 movl $(4096/64)-5, %ecx 26 movl $(4096/64)-5, %ecx
34 .p2align 4 27 .p2align 4
@@ -87,11 +80,7 @@ ENTRY(copy_page_regs)
87 jnz .Loop2 80 jnz .Loop2
88 81
89 movq (%rsp), %rbx 82 movq (%rsp), %rbx
90 CFI_RESTORE rbx
91 movq 1*8(%rsp), %r12 83 movq 1*8(%rsp), %r12
92 CFI_RESTORE r12
93 addq $2*8, %rsp 84 addq $2*8, %rsp
94 CFI_ADJUST_CFA_OFFSET -2*8
95 ret 85 ret
96 CFI_ENDPROC
97ENDPROC(copy_page_regs) 86ENDPROC(copy_page_regs)
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index fa997dfaef24..982ce34f4a9b 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -7,7 +7,6 @@
7 */ 7 */
8 8
9#include <linux/linkage.h> 9#include <linux/linkage.h>
10#include <asm/dwarf2.h>
11#include <asm/current.h> 10#include <asm/current.h>
12#include <asm/asm-offsets.h> 11#include <asm/asm-offsets.h>
13#include <asm/thread_info.h> 12#include <asm/thread_info.h>
@@ -16,33 +15,8 @@
16#include <asm/asm.h> 15#include <asm/asm.h>
17#include <asm/smap.h> 16#include <asm/smap.h>
18 17
19 .macro ALIGN_DESTINATION
20 /* check for bad alignment of destination */
21 movl %edi,%ecx
22 andl $7,%ecx
23 jz 102f /* already aligned */
24 subl $8,%ecx
25 negl %ecx
26 subl %ecx,%edx
27100: movb (%rsi),%al
28101: movb %al,(%rdi)
29 incq %rsi
30 incq %rdi
31 decl %ecx
32 jnz 100b
33102:
34 .section .fixup,"ax"
35103: addl %ecx,%edx /* ecx is zerorest also */
36 jmp copy_user_handle_tail
37 .previous
38
39 _ASM_EXTABLE(100b,103b)
40 _ASM_EXTABLE(101b,103b)
41 .endm
42
43/* Standard copy_to_user with segment limit checking */ 18/* Standard copy_to_user with segment limit checking */
44ENTRY(_copy_to_user) 19ENTRY(_copy_to_user)
45 CFI_STARTPROC
46 GET_THREAD_INFO(%rax) 20 GET_THREAD_INFO(%rax)
47 movq %rdi,%rcx 21 movq %rdi,%rcx
48 addq %rdx,%rcx 22 addq %rdx,%rcx
@@ -54,12 +28,10 @@ ENTRY(_copy_to_user)
54 X86_FEATURE_REP_GOOD, \ 28 X86_FEATURE_REP_GOOD, \
55 "jmp copy_user_enhanced_fast_string", \ 29 "jmp copy_user_enhanced_fast_string", \
56 X86_FEATURE_ERMS 30 X86_FEATURE_ERMS
57 CFI_ENDPROC
58ENDPROC(_copy_to_user) 31ENDPROC(_copy_to_user)
59 32
60/* Standard copy_from_user with segment limit checking */ 33/* Standard copy_from_user with segment limit checking */
61ENTRY(_copy_from_user) 34ENTRY(_copy_from_user)
62 CFI_STARTPROC
63 GET_THREAD_INFO(%rax) 35 GET_THREAD_INFO(%rax)
64 movq %rsi,%rcx 36 movq %rsi,%rcx
65 addq %rdx,%rcx 37 addq %rdx,%rcx
@@ -71,14 +43,12 @@ ENTRY(_copy_from_user)
71 X86_FEATURE_REP_GOOD, \ 43 X86_FEATURE_REP_GOOD, \
72 "jmp copy_user_enhanced_fast_string", \ 44 "jmp copy_user_enhanced_fast_string", \
73 X86_FEATURE_ERMS 45 X86_FEATURE_ERMS
74 CFI_ENDPROC
75ENDPROC(_copy_from_user) 46ENDPROC(_copy_from_user)
76 47
77 .section .fixup,"ax" 48 .section .fixup,"ax"
78 /* must zero dest */ 49 /* must zero dest */
79ENTRY(bad_from_user) 50ENTRY(bad_from_user)
80bad_from_user: 51bad_from_user:
81 CFI_STARTPROC
82 movl %edx,%ecx 52 movl %edx,%ecx
83 xorl %eax,%eax 53 xorl %eax,%eax
84 rep 54 rep
@@ -86,7 +56,6 @@ bad_from_user:
86bad_to_user: 56bad_to_user:
87 movl %edx,%eax 57 movl %edx,%eax
88 ret 58 ret
89 CFI_ENDPROC
90ENDPROC(bad_from_user) 59ENDPROC(bad_from_user)
91 .previous 60 .previous
92 61
@@ -104,7 +73,6 @@ ENDPROC(bad_from_user)
104 * eax uncopied bytes or 0 if successful. 73 * eax uncopied bytes or 0 if successful.
105 */ 74 */
106ENTRY(copy_user_generic_unrolled) 75ENTRY(copy_user_generic_unrolled)
107 CFI_STARTPROC
108 ASM_STAC 76 ASM_STAC
109 cmpl $8,%edx 77 cmpl $8,%edx
110 jb 20f /* less then 8 bytes, go to byte copy loop */ 78 jb 20f /* less then 8 bytes, go to byte copy loop */
@@ -186,7 +154,6 @@ ENTRY(copy_user_generic_unrolled)
186 _ASM_EXTABLE(19b,40b) 154 _ASM_EXTABLE(19b,40b)
187 _ASM_EXTABLE(21b,50b) 155 _ASM_EXTABLE(21b,50b)
188 _ASM_EXTABLE(22b,50b) 156 _ASM_EXTABLE(22b,50b)
189 CFI_ENDPROC
190ENDPROC(copy_user_generic_unrolled) 157ENDPROC(copy_user_generic_unrolled)
191 158
192/* Some CPUs run faster using the string copy instructions. 159/* Some CPUs run faster using the string copy instructions.
@@ -208,7 +175,6 @@ ENDPROC(copy_user_generic_unrolled)
208 * eax uncopied bytes or 0 if successful. 175 * eax uncopied bytes or 0 if successful.
209 */ 176 */
210ENTRY(copy_user_generic_string) 177ENTRY(copy_user_generic_string)
211 CFI_STARTPROC
212 ASM_STAC 178 ASM_STAC
213 cmpl $8,%edx 179 cmpl $8,%edx
214 jb 2f /* less than 8 bytes, go to byte copy loop */ 180 jb 2f /* less than 8 bytes, go to byte copy loop */
@@ -233,7 +199,6 @@ ENTRY(copy_user_generic_string)
233 199
234 _ASM_EXTABLE(1b,11b) 200 _ASM_EXTABLE(1b,11b)
235 _ASM_EXTABLE(3b,12b) 201 _ASM_EXTABLE(3b,12b)
236 CFI_ENDPROC
237ENDPROC(copy_user_generic_string) 202ENDPROC(copy_user_generic_string)
238 203
239/* 204/*
@@ -249,7 +214,6 @@ ENDPROC(copy_user_generic_string)
249 * eax uncopied bytes or 0 if successful. 214 * eax uncopied bytes or 0 if successful.
250 */ 215 */
251ENTRY(copy_user_enhanced_fast_string) 216ENTRY(copy_user_enhanced_fast_string)
252 CFI_STARTPROC
253 ASM_STAC 217 ASM_STAC
254 movl %edx,%ecx 218 movl %edx,%ecx
2551: rep 2191: rep
@@ -264,5 +228,94 @@ ENTRY(copy_user_enhanced_fast_string)
264 .previous 228 .previous
265 229
266 _ASM_EXTABLE(1b,12b) 230 _ASM_EXTABLE(1b,12b)
267 CFI_ENDPROC
268ENDPROC(copy_user_enhanced_fast_string) 231ENDPROC(copy_user_enhanced_fast_string)
232
233/*
234 * copy_user_nocache - Uncached memory copy with exception handling
235 * This will force destination/source out of cache for more performance.
236 */
237ENTRY(__copy_user_nocache)
238 ASM_STAC
239 cmpl $8,%edx
240 jb 20f /* less then 8 bytes, go to byte copy loop */
241 ALIGN_DESTINATION
242 movl %edx,%ecx
243 andl $63,%edx
244 shrl $6,%ecx
245 jz 17f
2461: movq (%rsi),%r8
2472: movq 1*8(%rsi),%r9
2483: movq 2*8(%rsi),%r10
2494: movq 3*8(%rsi),%r11
2505: movnti %r8,(%rdi)
2516: movnti %r9,1*8(%rdi)
2527: movnti %r10,2*8(%rdi)
2538: movnti %r11,3*8(%rdi)
2549: movq 4*8(%rsi),%r8
25510: movq 5*8(%rsi),%r9
25611: movq 6*8(%rsi),%r10
25712: movq 7*8(%rsi),%r11
25813: movnti %r8,4*8(%rdi)
25914: movnti %r9,5*8(%rdi)
26015: movnti %r10,6*8(%rdi)
26116: movnti %r11,7*8(%rdi)
262 leaq 64(%rsi),%rsi
263 leaq 64(%rdi),%rdi
264 decl %ecx
265 jnz 1b
26617: movl %edx,%ecx
267 andl $7,%edx
268 shrl $3,%ecx
269 jz 20f
27018: movq (%rsi),%r8
27119: movnti %r8,(%rdi)
272 leaq 8(%rsi),%rsi
273 leaq 8(%rdi),%rdi
274 decl %ecx
275 jnz 18b
27620: andl %edx,%edx
277 jz 23f
278 movl %edx,%ecx
27921: movb (%rsi),%al
28022: movb %al,(%rdi)
281 incq %rsi
282 incq %rdi
283 decl %ecx
284 jnz 21b
28523: xorl %eax,%eax
286 ASM_CLAC
287 sfence
288 ret
289
290 .section .fixup,"ax"
29130: shll $6,%ecx
292 addl %ecx,%edx
293 jmp 60f
29440: lea (%rdx,%rcx,8),%rdx
295 jmp 60f
29650: movl %ecx,%edx
29760: sfence
298 jmp copy_user_handle_tail
299 .previous
300
301 _ASM_EXTABLE(1b,30b)
302 _ASM_EXTABLE(2b,30b)
303 _ASM_EXTABLE(3b,30b)
304 _ASM_EXTABLE(4b,30b)
305 _ASM_EXTABLE(5b,30b)
306 _ASM_EXTABLE(6b,30b)
307 _ASM_EXTABLE(7b,30b)
308 _ASM_EXTABLE(8b,30b)
309 _ASM_EXTABLE(9b,30b)
310 _ASM_EXTABLE(10b,30b)
311 _ASM_EXTABLE(11b,30b)
312 _ASM_EXTABLE(12b,30b)
313 _ASM_EXTABLE(13b,30b)
314 _ASM_EXTABLE(14b,30b)
315 _ASM_EXTABLE(15b,30b)
316 _ASM_EXTABLE(16b,30b)
317 _ASM_EXTABLE(18b,40b)
318 _ASM_EXTABLE(19b,40b)
319 _ASM_EXTABLE(21b,50b)
320 _ASM_EXTABLE(22b,50b)
321ENDPROC(__copy_user_nocache)
diff --git a/arch/x86/lib/copy_user_nocache_64.S b/arch/x86/lib/copy_user_nocache_64.S
deleted file mode 100644
index 6a4f43c2d9e6..000000000000
--- a/arch/x86/lib/copy_user_nocache_64.S
+++ /dev/null
@@ -1,136 +0,0 @@
1/*
2 * Copyright 2008 Vitaly Mayatskikh <vmayatsk@redhat.com>
3 * Copyright 2002 Andi Kleen, SuSE Labs.
4 * Subject to the GNU Public License v2.
5 *
6 * Functions to copy from and to user space.
7 */
8
9#include <linux/linkage.h>
10#include <asm/dwarf2.h>
11
12#define FIX_ALIGNMENT 1
13
14#include <asm/current.h>
15#include <asm/asm-offsets.h>
16#include <asm/thread_info.h>
17#include <asm/asm.h>
18#include <asm/smap.h>
19
20 .macro ALIGN_DESTINATION
21#ifdef FIX_ALIGNMENT
22 /* check for bad alignment of destination */
23 movl %edi,%ecx
24 andl $7,%ecx
25 jz 102f /* already aligned */
26 subl $8,%ecx
27 negl %ecx
28 subl %ecx,%edx
29100: movb (%rsi),%al
30101: movb %al,(%rdi)
31 incq %rsi
32 incq %rdi
33 decl %ecx
34 jnz 100b
35102:
36 .section .fixup,"ax"
37103: addl %ecx,%edx /* ecx is zerorest also */
38 jmp copy_user_handle_tail
39 .previous
40
41 _ASM_EXTABLE(100b,103b)
42 _ASM_EXTABLE(101b,103b)
43#endif
44 .endm
45
46/*
47 * copy_user_nocache - Uncached memory copy with exception handling
48 * This will force destination/source out of cache for more performance.
49 */
50ENTRY(__copy_user_nocache)
51 CFI_STARTPROC
52 ASM_STAC
53 cmpl $8,%edx
54 jb 20f /* less then 8 bytes, go to byte copy loop */
55 ALIGN_DESTINATION
56 movl %edx,%ecx
57 andl $63,%edx
58 shrl $6,%ecx
59 jz 17f
601: movq (%rsi),%r8
612: movq 1*8(%rsi),%r9
623: movq 2*8(%rsi),%r10
634: movq 3*8(%rsi),%r11
645: movnti %r8,(%rdi)
656: movnti %r9,1*8(%rdi)
667: movnti %r10,2*8(%rdi)
678: movnti %r11,3*8(%rdi)
689: movq 4*8(%rsi),%r8
6910: movq 5*8(%rsi),%r9
7011: movq 6*8(%rsi),%r10
7112: movq 7*8(%rsi),%r11
7213: movnti %r8,4*8(%rdi)
7314: movnti %r9,5*8(%rdi)
7415: movnti %r10,6*8(%rdi)
7516: movnti %r11,7*8(%rdi)
76 leaq 64(%rsi),%rsi
77 leaq 64(%rdi),%rdi
78 decl %ecx
79 jnz 1b
8017: movl %edx,%ecx
81 andl $7,%edx
82 shrl $3,%ecx
83 jz 20f
8418: movq (%rsi),%r8
8519: movnti %r8,(%rdi)
86 leaq 8(%rsi),%rsi
87 leaq 8(%rdi),%rdi
88 decl %ecx
89 jnz 18b
9020: andl %edx,%edx
91 jz 23f
92 movl %edx,%ecx
9321: movb (%rsi),%al
9422: movb %al,(%rdi)
95 incq %rsi
96 incq %rdi
97 decl %ecx
98 jnz 21b
9923: xorl %eax,%eax
100 ASM_CLAC
101 sfence
102 ret
103
104 .section .fixup,"ax"
10530: shll $6,%ecx
106 addl %ecx,%edx
107 jmp 60f
10840: lea (%rdx,%rcx,8),%rdx
109 jmp 60f
11050: movl %ecx,%edx
11160: sfence
112 jmp copy_user_handle_tail
113 .previous
114
115 _ASM_EXTABLE(1b,30b)
116 _ASM_EXTABLE(2b,30b)
117 _ASM_EXTABLE(3b,30b)
118 _ASM_EXTABLE(4b,30b)
119 _ASM_EXTABLE(5b,30b)
120 _ASM_EXTABLE(6b,30b)
121 _ASM_EXTABLE(7b,30b)
122 _ASM_EXTABLE(8b,30b)
123 _ASM_EXTABLE(9b,30b)
124 _ASM_EXTABLE(10b,30b)
125 _ASM_EXTABLE(11b,30b)
126 _ASM_EXTABLE(12b,30b)
127 _ASM_EXTABLE(13b,30b)
128 _ASM_EXTABLE(14b,30b)
129 _ASM_EXTABLE(15b,30b)
130 _ASM_EXTABLE(16b,30b)
131 _ASM_EXTABLE(18b,40b)
132 _ASM_EXTABLE(19b,40b)
133 _ASM_EXTABLE(21b,50b)
134 _ASM_EXTABLE(22b,50b)
135 CFI_ENDPROC
136ENDPROC(__copy_user_nocache)
diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index 9734182966f3..7e48807b2fa1 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -6,7 +6,6 @@
6 * for more details. No warranty for anything given at all. 6 * for more details. No warranty for anything given at all.
7 */ 7 */
8#include <linux/linkage.h> 8#include <linux/linkage.h>
9#include <asm/dwarf2.h>
10#include <asm/errno.h> 9#include <asm/errno.h>
11#include <asm/asm.h> 10#include <asm/asm.h>
12 11
@@ -47,23 +46,16 @@
47 46
48 47
49ENTRY(csum_partial_copy_generic) 48ENTRY(csum_partial_copy_generic)
50 CFI_STARTPROC
51 cmpl $3*64, %edx 49 cmpl $3*64, %edx
52 jle .Lignore 50 jle .Lignore
53 51
54.Lignore: 52.Lignore:
55 subq $7*8, %rsp 53 subq $7*8, %rsp
56 CFI_ADJUST_CFA_OFFSET 7*8
57 movq %rbx, 2*8(%rsp) 54 movq %rbx, 2*8(%rsp)
58 CFI_REL_OFFSET rbx, 2*8
59 movq %r12, 3*8(%rsp) 55 movq %r12, 3*8(%rsp)
60 CFI_REL_OFFSET r12, 3*8
61 movq %r14, 4*8(%rsp) 56 movq %r14, 4*8(%rsp)
62 CFI_REL_OFFSET r14, 4*8
63 movq %r13, 5*8(%rsp) 57 movq %r13, 5*8(%rsp)
64 CFI_REL_OFFSET r13, 5*8
65 movq %rbp, 6*8(%rsp) 58 movq %rbp, 6*8(%rsp)
66 CFI_REL_OFFSET rbp, 6*8
67 59
68 movq %r8, (%rsp) 60 movq %r8, (%rsp)
69 movq %r9, 1*8(%rsp) 61 movq %r9, 1*8(%rsp)
@@ -206,22 +198,14 @@ ENTRY(csum_partial_copy_generic)
206 addl %ebx, %eax 198 addl %ebx, %eax
207 adcl %r9d, %eax /* carry */ 199 adcl %r9d, %eax /* carry */
208 200
209 CFI_REMEMBER_STATE
210.Lende: 201.Lende:
211 movq 2*8(%rsp), %rbx 202 movq 2*8(%rsp), %rbx
212 CFI_RESTORE rbx
213 movq 3*8(%rsp), %r12 203 movq 3*8(%rsp), %r12
214 CFI_RESTORE r12
215 movq 4*8(%rsp), %r14 204 movq 4*8(%rsp), %r14
216 CFI_RESTORE r14
217 movq 5*8(%rsp), %r13 205 movq 5*8(%rsp), %r13
218 CFI_RESTORE r13
219 movq 6*8(%rsp), %rbp 206 movq 6*8(%rsp), %rbp
220 CFI_RESTORE rbp
221 addq $7*8, %rsp 207 addq $7*8, %rsp
222 CFI_ADJUST_CFA_OFFSET -7*8
223 ret 208 ret
224 CFI_RESTORE_STATE
225 209
226 /* Exception handlers. Very simple, zeroing is done in the wrappers */ 210 /* Exception handlers. Very simple, zeroing is done in the wrappers */
227.Lbad_source: 211.Lbad_source:
@@ -237,5 +221,4 @@ ENTRY(csum_partial_copy_generic)
237 jz .Lende 221 jz .Lende
238 movl $-EFAULT, (%rax) 222 movl $-EFAULT, (%rax)
239 jmp .Lende 223 jmp .Lende
240 CFI_ENDPROC
241ENDPROC(csum_partial_copy_generic) 224ENDPROC(csum_partial_copy_generic)
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index a4512359656a..46668cda4ffd 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -26,7 +26,6 @@
26 */ 26 */
27 27
28#include <linux/linkage.h> 28#include <linux/linkage.h>
29#include <asm/dwarf2.h>
30#include <asm/page_types.h> 29#include <asm/page_types.h>
31#include <asm/errno.h> 30#include <asm/errno.h>
32#include <asm/asm-offsets.h> 31#include <asm/asm-offsets.h>
@@ -36,7 +35,6 @@
36 35
37 .text 36 .text
38ENTRY(__get_user_1) 37ENTRY(__get_user_1)
39 CFI_STARTPROC
40 GET_THREAD_INFO(%_ASM_DX) 38 GET_THREAD_INFO(%_ASM_DX)
41 cmp TI_addr_limit(%_ASM_DX),%_ASM_AX 39 cmp TI_addr_limit(%_ASM_DX),%_ASM_AX
42 jae bad_get_user 40 jae bad_get_user
@@ -45,11 +43,9 @@ ENTRY(__get_user_1)
45 xor %eax,%eax 43 xor %eax,%eax
46 ASM_CLAC 44 ASM_CLAC
47 ret 45 ret
48 CFI_ENDPROC
49ENDPROC(__get_user_1) 46ENDPROC(__get_user_1)
50 47
51ENTRY(__get_user_2) 48ENTRY(__get_user_2)
52 CFI_STARTPROC
53 add $1,%_ASM_AX 49 add $1,%_ASM_AX
54 jc bad_get_user 50 jc bad_get_user
55 GET_THREAD_INFO(%_ASM_DX) 51 GET_THREAD_INFO(%_ASM_DX)
@@ -60,11 +56,9 @@ ENTRY(__get_user_2)
60 xor %eax,%eax 56 xor %eax,%eax
61 ASM_CLAC 57 ASM_CLAC
62 ret 58 ret
63 CFI_ENDPROC
64ENDPROC(__get_user_2) 59ENDPROC(__get_user_2)
65 60
66ENTRY(__get_user_4) 61ENTRY(__get_user_4)
67 CFI_STARTPROC
68 add $3,%_ASM_AX 62 add $3,%_ASM_AX
69 jc bad_get_user 63 jc bad_get_user
70 GET_THREAD_INFO(%_ASM_DX) 64 GET_THREAD_INFO(%_ASM_DX)
@@ -75,11 +69,9 @@ ENTRY(__get_user_4)
75 xor %eax,%eax 69 xor %eax,%eax
76 ASM_CLAC 70 ASM_CLAC
77 ret 71 ret
78 CFI_ENDPROC
79ENDPROC(__get_user_4) 72ENDPROC(__get_user_4)
80 73
81ENTRY(__get_user_8) 74ENTRY(__get_user_8)
82 CFI_STARTPROC
83#ifdef CONFIG_X86_64 75#ifdef CONFIG_X86_64
84 add $7,%_ASM_AX 76 add $7,%_ASM_AX
85 jc bad_get_user 77 jc bad_get_user
@@ -104,28 +96,23 @@ ENTRY(__get_user_8)
104 ASM_CLAC 96 ASM_CLAC
105 ret 97 ret
106#endif 98#endif
107 CFI_ENDPROC
108ENDPROC(__get_user_8) 99ENDPROC(__get_user_8)
109 100
110 101
111bad_get_user: 102bad_get_user:
112 CFI_STARTPROC
113 xor %edx,%edx 103 xor %edx,%edx
114 mov $(-EFAULT),%_ASM_AX 104 mov $(-EFAULT),%_ASM_AX
115 ASM_CLAC 105 ASM_CLAC
116 ret 106 ret
117 CFI_ENDPROC
118END(bad_get_user) 107END(bad_get_user)
119 108
120#ifdef CONFIG_X86_32 109#ifdef CONFIG_X86_32
121bad_get_user_8: 110bad_get_user_8:
122 CFI_STARTPROC
123 xor %edx,%edx 111 xor %edx,%edx
124 xor %ecx,%ecx 112 xor %ecx,%ecx
125 mov $(-EFAULT),%_ASM_AX 113 mov $(-EFAULT),%_ASM_AX
126 ASM_CLAC 114 ASM_CLAC
127 ret 115 ret
128 CFI_ENDPROC
129END(bad_get_user_8) 116END(bad_get_user_8)
130#endif 117#endif
131 118
diff --git a/arch/x86/lib/iomap_copy_64.S b/arch/x86/lib/iomap_copy_64.S
index 05a95e713da8..33147fef3452 100644
--- a/arch/x86/lib/iomap_copy_64.S
+++ b/arch/x86/lib/iomap_copy_64.S
@@ -16,15 +16,12 @@
16 */ 16 */
17 17
18#include <linux/linkage.h> 18#include <linux/linkage.h>
19#include <asm/dwarf2.h>
20 19
21/* 20/*
22 * override generic version in lib/iomap_copy.c 21 * override generic version in lib/iomap_copy.c
23 */ 22 */
24ENTRY(__iowrite32_copy) 23ENTRY(__iowrite32_copy)
25 CFI_STARTPROC
26 movl %edx,%ecx 24 movl %edx,%ecx
27 rep movsd 25 rep movsd
28 ret 26 ret
29 CFI_ENDPROC
30ENDPROC(__iowrite32_copy) 27ENDPROC(__iowrite32_copy)
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index b046664f5a1c..16698bba87de 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -2,7 +2,6 @@
2 2
3#include <linux/linkage.h> 3#include <linux/linkage.h>
4#include <asm/cpufeature.h> 4#include <asm/cpufeature.h>
5#include <asm/dwarf2.h>
6#include <asm/alternative-asm.h> 5#include <asm/alternative-asm.h>
7 6
8/* 7/*
@@ -53,7 +52,6 @@ ENTRY(memcpy_erms)
53ENDPROC(memcpy_erms) 52ENDPROC(memcpy_erms)
54 53
55ENTRY(memcpy_orig) 54ENTRY(memcpy_orig)
56 CFI_STARTPROC
57 movq %rdi, %rax 55 movq %rdi, %rax
58 56
59 cmpq $0x20, %rdx 57 cmpq $0x20, %rdx
@@ -178,5 +176,4 @@ ENTRY(memcpy_orig)
178 176
179.Lend: 177.Lend:
180 retq 178 retq
181 CFI_ENDPROC
182ENDPROC(memcpy_orig) 179ENDPROC(memcpy_orig)
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 0f8a0d0331b9..ca2afdd6d98e 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -6,7 +6,6 @@
6 * - Copyright 2011 Fenghua Yu <fenghua.yu@intel.com> 6 * - Copyright 2011 Fenghua Yu <fenghua.yu@intel.com>
7 */ 7 */
8#include <linux/linkage.h> 8#include <linux/linkage.h>
9#include <asm/dwarf2.h>
10#include <asm/cpufeature.h> 9#include <asm/cpufeature.h>
11#include <asm/alternative-asm.h> 10#include <asm/alternative-asm.h>
12 11
@@ -27,7 +26,6 @@
27 26
28ENTRY(memmove) 27ENTRY(memmove)
29ENTRY(__memmove) 28ENTRY(__memmove)
30 CFI_STARTPROC
31 29
32 /* Handle more 32 bytes in loop */ 30 /* Handle more 32 bytes in loop */
33 mov %rdi, %rax 31 mov %rdi, %rax
@@ -207,6 +205,5 @@ ENTRY(__memmove)
207 movb %r11b, (%rdi) 205 movb %r11b, (%rdi)
20813: 20613:
209 retq 207 retq
210 CFI_ENDPROC
211ENDPROC(__memmove) 208ENDPROC(__memmove)
212ENDPROC(memmove) 209ENDPROC(memmove)
diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 93118fb23976..2661fad05827 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -1,7 +1,6 @@
1/* Copyright 2002 Andi Kleen, SuSE Labs */ 1/* Copyright 2002 Andi Kleen, SuSE Labs */
2 2
3#include <linux/linkage.h> 3#include <linux/linkage.h>
4#include <asm/dwarf2.h>
5#include <asm/cpufeature.h> 4#include <asm/cpufeature.h>
6#include <asm/alternative-asm.h> 5#include <asm/alternative-asm.h>
7 6
@@ -66,7 +65,6 @@ ENTRY(memset_erms)
66ENDPROC(memset_erms) 65ENDPROC(memset_erms)
67 66
68ENTRY(memset_orig) 67ENTRY(memset_orig)
69 CFI_STARTPROC
70 movq %rdi,%r10 68 movq %rdi,%r10
71 69
72 /* expand byte value */ 70 /* expand byte value */
@@ -78,7 +76,6 @@ ENTRY(memset_orig)
78 movl %edi,%r9d 76 movl %edi,%r9d
79 andl $7,%r9d 77 andl $7,%r9d
80 jnz .Lbad_alignment 78 jnz .Lbad_alignment
81 CFI_REMEMBER_STATE
82.Lafter_bad_alignment: 79.Lafter_bad_alignment:
83 80
84 movq %rdx,%rcx 81 movq %rdx,%rcx
@@ -128,7 +125,6 @@ ENTRY(memset_orig)
128 movq %r10,%rax 125 movq %r10,%rax
129 ret 126 ret
130 127
131 CFI_RESTORE_STATE
132.Lbad_alignment: 128.Lbad_alignment:
133 cmpq $7,%rdx 129 cmpq $7,%rdx
134 jbe .Lhandle_7 130 jbe .Lhandle_7
@@ -139,5 +135,4 @@ ENTRY(memset_orig)
139 subq %r8,%rdx 135 subq %r8,%rdx
140 jmp .Lafter_bad_alignment 136 jmp .Lafter_bad_alignment
141.Lfinal: 137.Lfinal:
142 CFI_ENDPROC
143ENDPROC(memset_orig) 138ENDPROC(memset_orig)
diff --git a/arch/x86/lib/msr-reg.S b/arch/x86/lib/msr-reg.S
index 3ca5218fbece..c81556409bbb 100644
--- a/arch/x86/lib/msr-reg.S
+++ b/arch/x86/lib/msr-reg.S
@@ -1,6 +1,5 @@
1#include <linux/linkage.h> 1#include <linux/linkage.h>
2#include <linux/errno.h> 2#include <linux/errno.h>
3#include <asm/dwarf2.h>
4#include <asm/asm.h> 3#include <asm/asm.h>
5#include <asm/msr.h> 4#include <asm/msr.h>
6 5
@@ -13,9 +12,8 @@
13 */ 12 */
14.macro op_safe_regs op 13.macro op_safe_regs op
15ENTRY(\op\()_safe_regs) 14ENTRY(\op\()_safe_regs)
16 CFI_STARTPROC 15 pushq %rbx
17 pushq_cfi_reg rbx 16 pushq %rbp
18 pushq_cfi_reg rbp
19 movq %rdi, %r10 /* Save pointer */ 17 movq %rdi, %r10 /* Save pointer */
20 xorl %r11d, %r11d /* Return value */ 18 xorl %r11d, %r11d /* Return value */
21 movl (%rdi), %eax 19 movl (%rdi), %eax
@@ -25,7 +23,6 @@ ENTRY(\op\()_safe_regs)
25 movl 20(%rdi), %ebp 23 movl 20(%rdi), %ebp
26 movl 24(%rdi), %esi 24 movl 24(%rdi), %esi
27 movl 28(%rdi), %edi 25 movl 28(%rdi), %edi
28 CFI_REMEMBER_STATE
291: \op 261: \op
302: movl %eax, (%r10) 272: movl %eax, (%r10)
31 movl %r11d, %eax /* Return value */ 28 movl %r11d, %eax /* Return value */
@@ -35,16 +32,14 @@ ENTRY(\op\()_safe_regs)
35 movl %ebp, 20(%r10) 32 movl %ebp, 20(%r10)
36 movl %esi, 24(%r10) 33 movl %esi, 24(%r10)
37 movl %edi, 28(%r10) 34 movl %edi, 28(%r10)
38 popq_cfi_reg rbp 35 popq %rbp
39 popq_cfi_reg rbx 36 popq %rbx
40 ret 37 ret
413: 383:
42 CFI_RESTORE_STATE
43 movl $-EIO, %r11d 39 movl $-EIO, %r11d
44 jmp 2b 40 jmp 2b
45 41
46 _ASM_EXTABLE(1b, 3b) 42 _ASM_EXTABLE(1b, 3b)
47 CFI_ENDPROC
48ENDPROC(\op\()_safe_regs) 43ENDPROC(\op\()_safe_regs)
49.endm 44.endm
50 45
@@ -52,13 +47,12 @@ ENDPROC(\op\()_safe_regs)
52 47
53.macro op_safe_regs op 48.macro op_safe_regs op
54ENTRY(\op\()_safe_regs) 49ENTRY(\op\()_safe_regs)
55 CFI_STARTPROC 50 pushl %ebx
56 pushl_cfi_reg ebx 51 pushl %ebp
57 pushl_cfi_reg ebp 52 pushl %esi
58 pushl_cfi_reg esi 53 pushl %edi
59 pushl_cfi_reg edi 54 pushl $0 /* Return value */
60 pushl_cfi $0 /* Return value */ 55 pushl %eax
61 pushl_cfi %eax
62 movl 4(%eax), %ecx 56 movl 4(%eax), %ecx
63 movl 8(%eax), %edx 57 movl 8(%eax), %edx
64 movl 12(%eax), %ebx 58 movl 12(%eax), %ebx
@@ -66,32 +60,28 @@ ENTRY(\op\()_safe_regs)
66 movl 24(%eax), %esi 60 movl 24(%eax), %esi
67 movl 28(%eax), %edi 61 movl 28(%eax), %edi
68 movl (%eax), %eax 62 movl (%eax), %eax
69 CFI_REMEMBER_STATE
701: \op 631: \op
712: pushl_cfi %eax 642: pushl %eax
72 movl 4(%esp), %eax 65 movl 4(%esp), %eax
73 popl_cfi (%eax) 66 popl (%eax)
74 addl $4, %esp 67 addl $4, %esp
75 CFI_ADJUST_CFA_OFFSET -4
76 movl %ecx, 4(%eax) 68 movl %ecx, 4(%eax)
77 movl %edx, 8(%eax) 69 movl %edx, 8(%eax)
78 movl %ebx, 12(%eax) 70 movl %ebx, 12(%eax)
79 movl %ebp, 20(%eax) 71 movl %ebp, 20(%eax)
80 movl %esi, 24(%eax) 72 movl %esi, 24(%eax)
81 movl %edi, 28(%eax) 73 movl %edi, 28(%eax)
82 popl_cfi %eax 74 popl %eax
83 popl_cfi_reg edi 75 popl %edi
84 popl_cfi_reg esi 76 popl %esi
85 popl_cfi_reg ebp 77 popl %ebp
86 popl_cfi_reg ebx 78 popl %ebx
87 ret 79 ret
883: 803:
89 CFI_RESTORE_STATE
90 movl $-EIO, 4(%esp) 81 movl $-EIO, 4(%esp)
91 jmp 2b 82 jmp 2b
92 83
93 _ASM_EXTABLE(1b, 3b) 84 _ASM_EXTABLE(1b, 3b)
94 CFI_ENDPROC
95ENDPROC(\op\()_safe_regs) 85ENDPROC(\op\()_safe_regs)
96.endm 86.endm
97 87
diff --git a/arch/x86/lib/putuser.S b/arch/x86/lib/putuser.S
index fc6ba17a7eec..e0817a12d323 100644
--- a/arch/x86/lib/putuser.S
+++ b/arch/x86/lib/putuser.S
@@ -11,7 +11,6 @@
11 * return value. 11 * return value.
12 */ 12 */
13#include <linux/linkage.h> 13#include <linux/linkage.h>
14#include <asm/dwarf2.h>
15#include <asm/thread_info.h> 14#include <asm/thread_info.h>
16#include <asm/errno.h> 15#include <asm/errno.h>
17#include <asm/asm.h> 16#include <asm/asm.h>
@@ -30,11 +29,9 @@
30 * as they get called from within inline assembly. 29 * as they get called from within inline assembly.
31 */ 30 */
32 31
33#define ENTER CFI_STARTPROC ; \ 32#define ENTER GET_THREAD_INFO(%_ASM_BX)
34 GET_THREAD_INFO(%_ASM_BX)
35#define EXIT ASM_CLAC ; \ 33#define EXIT ASM_CLAC ; \
36 ret ; \ 34 ret
37 CFI_ENDPROC
38 35
39.text 36.text
40ENTRY(__put_user_1) 37ENTRY(__put_user_1)
@@ -87,7 +84,6 @@ ENTRY(__put_user_8)
87ENDPROC(__put_user_8) 84ENDPROC(__put_user_8)
88 85
89bad_put_user: 86bad_put_user:
90 CFI_STARTPROC
91 movl $-EFAULT,%eax 87 movl $-EFAULT,%eax
92 EXIT 88 EXIT
93END(bad_put_user) 89END(bad_put_user)
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 2322abe4da3b..40027db99140 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -15,7 +15,6 @@
15 15
16#include <linux/linkage.h> 16#include <linux/linkage.h>
17#include <asm/alternative-asm.h> 17#include <asm/alternative-asm.h>
18#include <asm/dwarf2.h>
19 18
20#define __ASM_HALF_REG(reg) __ASM_SEL(reg, e##reg) 19#define __ASM_HALF_REG(reg) __ASM_SEL(reg, e##reg)
21#define __ASM_HALF_SIZE(inst) __ASM_SEL(inst##w, inst##l) 20#define __ASM_HALF_SIZE(inst) __ASM_SEL(inst##w, inst##l)
@@ -34,10 +33,10 @@
34 */ 33 */
35 34
36#define save_common_regs \ 35#define save_common_regs \
37 pushl_cfi_reg ecx 36 pushl %ecx
38 37
39#define restore_common_regs \ 38#define restore_common_regs \
40 popl_cfi_reg ecx 39 popl %ecx
41 40
42 /* Avoid uglifying the argument copying x86-64 needs to do. */ 41 /* Avoid uglifying the argument copying x86-64 needs to do. */
43 .macro movq src, dst 42 .macro movq src, dst
@@ -64,50 +63,45 @@
64 */ 63 */
65 64
66#define save_common_regs \ 65#define save_common_regs \
67 pushq_cfi_reg rdi; \ 66 pushq %rdi; \
68 pushq_cfi_reg rsi; \ 67 pushq %rsi; \
69 pushq_cfi_reg rcx; \ 68 pushq %rcx; \
70 pushq_cfi_reg r8; \ 69 pushq %r8; \
71 pushq_cfi_reg r9; \ 70 pushq %r9; \
72 pushq_cfi_reg r10; \ 71 pushq %r10; \
73 pushq_cfi_reg r11 72 pushq %r11
74 73
75#define restore_common_regs \ 74#define restore_common_regs \
76 popq_cfi_reg r11; \ 75 popq %r11; \
77 popq_cfi_reg r10; \ 76 popq %r10; \
78 popq_cfi_reg r9; \ 77 popq %r9; \
79 popq_cfi_reg r8; \ 78 popq %r8; \
80 popq_cfi_reg rcx; \ 79 popq %rcx; \
81 popq_cfi_reg rsi; \ 80 popq %rsi; \
82 popq_cfi_reg rdi 81 popq %rdi
83 82
84#endif 83#endif
85 84
86/* Fix up special calling conventions */ 85/* Fix up special calling conventions */
87ENTRY(call_rwsem_down_read_failed) 86ENTRY(call_rwsem_down_read_failed)
88 CFI_STARTPROC
89 save_common_regs 87 save_common_regs
90 __ASM_SIZE(push,_cfi_reg) __ASM_REG(dx) 88 __ASM_SIZE(push,) %__ASM_REG(dx)
91 movq %rax,%rdi 89 movq %rax,%rdi
92 call rwsem_down_read_failed 90 call rwsem_down_read_failed
93 __ASM_SIZE(pop,_cfi_reg) __ASM_REG(dx) 91 __ASM_SIZE(pop,) %__ASM_REG(dx)
94 restore_common_regs 92 restore_common_regs
95 ret 93 ret
96 CFI_ENDPROC
97ENDPROC(call_rwsem_down_read_failed) 94ENDPROC(call_rwsem_down_read_failed)
98 95
99ENTRY(call_rwsem_down_write_failed) 96ENTRY(call_rwsem_down_write_failed)
100 CFI_STARTPROC
101 save_common_regs 97 save_common_regs
102 movq %rax,%rdi 98 movq %rax,%rdi
103 call rwsem_down_write_failed 99 call rwsem_down_write_failed
104 restore_common_regs 100 restore_common_regs
105 ret 101 ret
106 CFI_ENDPROC
107ENDPROC(call_rwsem_down_write_failed) 102ENDPROC(call_rwsem_down_write_failed)
108 103
109ENTRY(call_rwsem_wake) 104ENTRY(call_rwsem_wake)
110 CFI_STARTPROC
111 /* do nothing if still outstanding active readers */ 105 /* do nothing if still outstanding active readers */
112 __ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx) 106 __ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
113 jnz 1f 107 jnz 1f
@@ -116,17 +110,14 @@ ENTRY(call_rwsem_wake)
116 call rwsem_wake 110 call rwsem_wake
117 restore_common_regs 111 restore_common_regs
1181: ret 1121: ret
119 CFI_ENDPROC
120ENDPROC(call_rwsem_wake) 113ENDPROC(call_rwsem_wake)
121 114
122ENTRY(call_rwsem_downgrade_wake) 115ENTRY(call_rwsem_downgrade_wake)
123 CFI_STARTPROC
124 save_common_regs 116 save_common_regs
125 __ASM_SIZE(push,_cfi_reg) __ASM_REG(dx) 117 __ASM_SIZE(push,) %__ASM_REG(dx)
126 movq %rax,%rdi 118 movq %rax,%rdi
127 call rwsem_downgrade_wake 119 call rwsem_downgrade_wake
128 __ASM_SIZE(pop,_cfi_reg) __ASM_REG(dx) 120 __ASM_SIZE(pop,) %__ASM_REG(dx)
129 restore_common_regs 121 restore_common_regs
130 ret 122 ret
131 CFI_ENDPROC
132ENDPROC(call_rwsem_downgrade_wake) 123ENDPROC(call_rwsem_downgrade_wake)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 1d553186c434..8533b46e6bee 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -40,7 +40,7 @@
40 */ 40 */
41uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = { 41uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
42 [_PAGE_CACHE_MODE_WB ] = 0 | 0 , 42 [_PAGE_CACHE_MODE_WB ] = 0 | 0 ,
43 [_PAGE_CACHE_MODE_WC ] = _PAGE_PWT | 0 , 43 [_PAGE_CACHE_MODE_WC ] = 0 | _PAGE_PCD,
44 [_PAGE_CACHE_MODE_UC_MINUS] = 0 | _PAGE_PCD, 44 [_PAGE_CACHE_MODE_UC_MINUS] = 0 | _PAGE_PCD,
45 [_PAGE_CACHE_MODE_UC ] = _PAGE_PWT | _PAGE_PCD, 45 [_PAGE_CACHE_MODE_UC ] = _PAGE_PWT | _PAGE_PCD,
46 [_PAGE_CACHE_MODE_WT ] = 0 | _PAGE_PCD, 46 [_PAGE_CACHE_MODE_WT ] = 0 | _PAGE_PCD,
@@ -50,11 +50,11 @@ EXPORT_SYMBOL(__cachemode2pte_tbl);
50 50
51uint8_t __pte2cachemode_tbl[8] = { 51uint8_t __pte2cachemode_tbl[8] = {
52 [__pte2cm_idx( 0 | 0 | 0 )] = _PAGE_CACHE_MODE_WB, 52 [__pte2cm_idx( 0 | 0 | 0 )] = _PAGE_CACHE_MODE_WB,
53 [__pte2cm_idx(_PAGE_PWT | 0 | 0 )] = _PAGE_CACHE_MODE_WC, 53 [__pte2cm_idx(_PAGE_PWT | 0 | 0 )] = _PAGE_CACHE_MODE_UC_MINUS,
54 [__pte2cm_idx( 0 | _PAGE_PCD | 0 )] = _PAGE_CACHE_MODE_UC_MINUS, 54 [__pte2cm_idx( 0 | _PAGE_PCD | 0 )] = _PAGE_CACHE_MODE_UC_MINUS,
55 [__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | 0 )] = _PAGE_CACHE_MODE_UC, 55 [__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | 0 )] = _PAGE_CACHE_MODE_UC,
56 [__pte2cm_idx( 0 | 0 | _PAGE_PAT)] = _PAGE_CACHE_MODE_WB, 56 [__pte2cm_idx( 0 | 0 | _PAGE_PAT)] = _PAGE_CACHE_MODE_WB,
57 [__pte2cm_idx(_PAGE_PWT | 0 | _PAGE_PAT)] = _PAGE_CACHE_MODE_WC, 57 [__pte2cm_idx(_PAGE_PWT | 0 | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS,
58 [__pte2cm_idx(0 | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS, 58 [__pte2cm_idx(0 | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS,
59 [__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC, 59 [__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC,
60}; 60};
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 2b7ece0e103a..9c0ff045fdd4 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -78,13 +78,13 @@ void __iomem *
78iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot) 78iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
79{ 79{
80 /* 80 /*
81 * For non-PAT systems, promote PAGE_KERNEL_WC to PAGE_KERNEL_UC_MINUS. 81 * For non-PAT systems, translate non-WB request to UC- just in
82 * PAGE_KERNEL_WC maps to PWT, which translates to uncached if the 82 * case the caller set the PWT bit to prot directly without using
83 * MTRR is UC or WC. UC_MINUS gets the real intention, of the 83 * pgprot_writecombine(). UC- translates to uncached if the MTRR
84 * user, which is "WC if the MTRR is WC, UC if you can't do that." 84 * is UC or WC. UC- gets the real intention, of the user, which is
85 * "WC if the MTRR is WC, UC if you can't do that."
85 */ 86 */
86 if (!pat_enabled && pgprot_val(prot) == 87 if (!pat_enabled() && pgprot2cachemode(prot) != _PAGE_CACHE_MODE_WB)
87 (__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
88 prot = __pgprot(__PAGE_KERNEL | 88 prot = __pgprot(__PAGE_KERNEL |
89 cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS)); 89 cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
90 90
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 27ff21216dfa..cc5ccc415cc0 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -42,6 +42,9 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
42 case _PAGE_CACHE_MODE_WC: 42 case _PAGE_CACHE_MODE_WC:
43 err = _set_memory_wc(vaddr, nrpages); 43 err = _set_memory_wc(vaddr, nrpages);
44 break; 44 break;
45 case _PAGE_CACHE_MODE_WT:
46 err = _set_memory_wt(vaddr, nrpages);
47 break;
45 case _PAGE_CACHE_MODE_WB: 48 case _PAGE_CACHE_MODE_WB:
46 err = _set_memory_wb(vaddr, nrpages); 49 err = _set_memory_wb(vaddr, nrpages);
47 break; 50 break;
@@ -172,6 +175,10 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
172 prot = __pgprot(pgprot_val(prot) | 175 prot = __pgprot(pgprot_val(prot) |
173 cachemode2protval(_PAGE_CACHE_MODE_WC)); 176 cachemode2protval(_PAGE_CACHE_MODE_WC));
174 break; 177 break;
178 case _PAGE_CACHE_MODE_WT:
179 prot = __pgprot(pgprot_val(prot) |
180 cachemode2protval(_PAGE_CACHE_MODE_WT));
181 break;
175 case _PAGE_CACHE_MODE_WB: 182 case _PAGE_CACHE_MODE_WB:
176 break; 183 break;
177 } 184 }
@@ -234,10 +241,11 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
234{ 241{
235 /* 242 /*
236 * Ideally, this should be: 243 * Ideally, this should be:
237 * pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS; 244 * pat_enabled() ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
238 * 245 *
239 * Till we fix all X drivers to use ioremap_wc(), we will use 246 * Till we fix all X drivers to use ioremap_wc(), we will use
240 * UC MINUS. 247 * UC MINUS. Drivers that are certain they need or can already
248 * be converted over to strong UC can use ioremap_uc().
241 */ 249 */
242 enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS; 250 enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
243 251
@@ -247,6 +255,39 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
247EXPORT_SYMBOL(ioremap_nocache); 255EXPORT_SYMBOL(ioremap_nocache);
248 256
249/** 257/**
258 * ioremap_uc - map bus memory into CPU space as strongly uncachable
259 * @phys_addr: bus address of the memory
260 * @size: size of the resource to map
261 *
262 * ioremap_uc performs a platform specific sequence of operations to
263 * make bus memory CPU accessible via the readb/readw/readl/writeb/
264 * writew/writel functions and the other mmio helpers. The returned
265 * address is not guaranteed to be usable directly as a virtual
266 * address.
267 *
268 * This version of ioremap ensures that the memory is marked with a strong
269 * preference as completely uncachable on the CPU when possible. For non-PAT
270 * systems this ends up setting page-attribute flags PCD=1, PWT=1. For PAT
271 * systems this will set the PAT entry for the pages as strong UC. This call
272 * will honor existing caching rules from things like the PCI bus. Note that
273 * there are other caches and buffers on many busses. In particular driver
274 * authors should read up on PCI writes.
275 *
276 * It's useful if some control registers are in such an area and
277 * write combining or read caching is not desirable:
278 *
279 * Must be freed with iounmap.
280 */
281void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size)
282{
283 enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
284
285 return __ioremap_caller(phys_addr, size, pcm,
286 __builtin_return_address(0));
287}
288EXPORT_SYMBOL_GPL(ioremap_uc);
289
290/**
250 * ioremap_wc - map memory into CPU space write combined 291 * ioremap_wc - map memory into CPU space write combined
251 * @phys_addr: bus address of the memory 292 * @phys_addr: bus address of the memory
252 * @size: size of the resource to map 293 * @size: size of the resource to map
@@ -258,14 +299,28 @@ EXPORT_SYMBOL(ioremap_nocache);
258 */ 299 */
259void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size) 300void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
260{ 301{
261 if (pat_enabled) 302 return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
262 return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
263 __builtin_return_address(0)); 303 __builtin_return_address(0));
264 else
265 return ioremap_nocache(phys_addr, size);
266} 304}
267EXPORT_SYMBOL(ioremap_wc); 305EXPORT_SYMBOL(ioremap_wc);
268 306
307/**
308 * ioremap_wt - map memory into CPU space write through
309 * @phys_addr: bus address of the memory
310 * @size: size of the resource to map
311 *
312 * This version of ioremap ensures that the memory is marked write through.
313 * Write through stores data into memory while keeping the cache up-to-date.
314 *
315 * Must be freed with iounmap.
316 */
317void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
318{
319 return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
320 __builtin_return_address(0));
321}
322EXPORT_SYMBOL(ioremap_wt);
323
269void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size) 324void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
270{ 325{
271 return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB, 326 return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
@@ -331,7 +386,7 @@ void iounmap(volatile void __iomem *addr)
331} 386}
332EXPORT_SYMBOL(iounmap); 387EXPORT_SYMBOL(iounmap);
333 388
334int arch_ioremap_pud_supported(void) 389int __init arch_ioremap_pud_supported(void)
335{ 390{
336#ifdef CONFIG_X86_64 391#ifdef CONFIG_X86_64
337 return cpu_has_gbpages; 392 return cpu_has_gbpages;
@@ -340,7 +395,7 @@ int arch_ioremap_pud_supported(void)
340#endif 395#endif
341} 396}
342 397
343int arch_ioremap_pmd_supported(void) 398int __init arch_ioremap_pmd_supported(void)
344{ 399{
345 return cpu_has_pse; 400 return cpu_has_pse;
346} 401}
diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c
index 6629f397b467..8ff686aa7e8c 100644
--- a/arch/x86/mm/pageattr-test.c
+++ b/arch/x86/mm/pageattr-test.c
@@ -9,6 +9,7 @@
9#include <linux/random.h> 9#include <linux/random.h>
10#include <linux/kernel.h> 10#include <linux/kernel.h>
11#include <linux/mm.h> 11#include <linux/mm.h>
12#include <linux/vmalloc.h>
12 13
13#include <asm/cacheflush.h> 14#include <asm/cacheflush.h>
14#include <asm/pgtable.h> 15#include <asm/pgtable.h>
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 89af288ec674..727158cb3b3c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -14,6 +14,7 @@
14#include <linux/percpu.h> 14#include <linux/percpu.h>
15#include <linux/gfp.h> 15#include <linux/gfp.h>
16#include <linux/pci.h> 16#include <linux/pci.h>
17#include <linux/vmalloc.h>
17 18
18#include <asm/e820.h> 19#include <asm/e820.h>
19#include <asm/processor.h> 20#include <asm/processor.h>
@@ -129,16 +130,15 @@ within(unsigned long addr, unsigned long start, unsigned long end)
129 */ 130 */
130void clflush_cache_range(void *vaddr, unsigned int size) 131void clflush_cache_range(void *vaddr, unsigned int size)
131{ 132{
132 void *vend = vaddr + size - 1; 133 unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
134 void *vend = vaddr + size;
135 void *p;
133 136
134 mb(); 137 mb();
135 138
136 for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size) 139 for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
137 clflushopt(vaddr); 140 p < vend; p += boot_cpu_data.x86_clflush_size)
138 /* 141 clflushopt(p);
139 * Flush any possible final partial cacheline:
140 */
141 clflushopt(vend);
142 142
143 mb(); 143 mb();
144} 144}
@@ -418,13 +418,11 @@ phys_addr_t slow_virt_to_phys(void *__virt_addr)
418 phys_addr_t phys_addr; 418 phys_addr_t phys_addr;
419 unsigned long offset; 419 unsigned long offset;
420 enum pg_level level; 420 enum pg_level level;
421 unsigned long psize;
422 unsigned long pmask; 421 unsigned long pmask;
423 pte_t *pte; 422 pte_t *pte;
424 423
425 pte = lookup_address(virt_addr, &level); 424 pte = lookup_address(virt_addr, &level);
426 BUG_ON(!pte); 425 BUG_ON(!pte);
427 psize = page_level_size(level);
428 pmask = page_level_mask(level); 426 pmask = page_level_mask(level);
429 offset = virt_addr & ~pmask; 427 offset = virt_addr & ~pmask;
430 phys_addr = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT; 428 phys_addr = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
@@ -1468,6 +1466,9 @@ int _set_memory_uc(unsigned long addr, int numpages)
1468{ 1466{
1469 /* 1467 /*
1470 * for now UC MINUS. see comments in ioremap_nocache() 1468 * for now UC MINUS. see comments in ioremap_nocache()
1469 * If you really need strong UC use ioremap_uc(), but note
1470 * that you cannot override IO areas with set_memory_*() as
1471 * these helpers cannot work with IO memory.
1471 */ 1472 */
1472 return change_page_attr_set(&addr, numpages, 1473 return change_page_attr_set(&addr, numpages,
1473 cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS), 1474 cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS),
@@ -1502,12 +1503,10 @@ EXPORT_SYMBOL(set_memory_uc);
1502static int _set_memory_array(unsigned long *addr, int addrinarray, 1503static int _set_memory_array(unsigned long *addr, int addrinarray,
1503 enum page_cache_mode new_type) 1504 enum page_cache_mode new_type)
1504{ 1505{
1506 enum page_cache_mode set_type;
1505 int i, j; 1507 int i, j;
1506 int ret; 1508 int ret;
1507 1509
1508 /*
1509 * for now UC MINUS. see comments in ioremap_nocache()
1510 */
1511 for (i = 0; i < addrinarray; i++) { 1510 for (i = 0; i < addrinarray; i++) {
1512 ret = reserve_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE, 1511 ret = reserve_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE,
1513 new_type, NULL); 1512 new_type, NULL);
@@ -1515,9 +1514,12 @@ static int _set_memory_array(unsigned long *addr, int addrinarray,
1515 goto out_free; 1514 goto out_free;
1516 } 1515 }
1517 1516
1517 /* If WC, set to UC- first and then WC */
1518 set_type = (new_type == _PAGE_CACHE_MODE_WC) ?
1519 _PAGE_CACHE_MODE_UC_MINUS : new_type;
1520
1518 ret = change_page_attr_set(addr, addrinarray, 1521 ret = change_page_attr_set(addr, addrinarray,
1519 cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS), 1522 cachemode2pgprot(set_type), 1);
1520 1);
1521 1523
1522 if (!ret && new_type == _PAGE_CACHE_MODE_WC) 1524 if (!ret && new_type == _PAGE_CACHE_MODE_WC)
1523 ret = change_page_attr_set_clr(addr, addrinarray, 1525 ret = change_page_attr_set_clr(addr, addrinarray,
@@ -1549,6 +1551,12 @@ int set_memory_array_wc(unsigned long *addr, int addrinarray)
1549} 1551}
1550EXPORT_SYMBOL(set_memory_array_wc); 1552EXPORT_SYMBOL(set_memory_array_wc);
1551 1553
1554int set_memory_array_wt(unsigned long *addr, int addrinarray)
1555{
1556 return _set_memory_array(addr, addrinarray, _PAGE_CACHE_MODE_WT);
1557}
1558EXPORT_SYMBOL_GPL(set_memory_array_wt);
1559
1552int _set_memory_wc(unsigned long addr, int numpages) 1560int _set_memory_wc(unsigned long addr, int numpages)
1553{ 1561{
1554 int ret; 1562 int ret;
@@ -1571,27 +1579,42 @@ int set_memory_wc(unsigned long addr, int numpages)
1571{ 1579{
1572 int ret; 1580 int ret;
1573 1581
1574 if (!pat_enabled)
1575 return set_memory_uc(addr, numpages);
1576
1577 ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE, 1582 ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
1578 _PAGE_CACHE_MODE_WC, NULL); 1583 _PAGE_CACHE_MODE_WC, NULL);
1579 if (ret) 1584 if (ret)
1580 goto out_err; 1585 return ret;
1581 1586
1582 ret = _set_memory_wc(addr, numpages); 1587 ret = _set_memory_wc(addr, numpages);
1583 if (ret) 1588 if (ret)
1584 goto out_free; 1589 free_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE);
1585
1586 return 0;
1587 1590
1588out_free:
1589 free_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE);
1590out_err:
1591 return ret; 1591 return ret;
1592} 1592}
1593EXPORT_SYMBOL(set_memory_wc); 1593EXPORT_SYMBOL(set_memory_wc);
1594 1594
1595int _set_memory_wt(unsigned long addr, int numpages)
1596{
1597 return change_page_attr_set(&addr, numpages,
1598 cachemode2pgprot(_PAGE_CACHE_MODE_WT), 0);
1599}
1600
1601int set_memory_wt(unsigned long addr, int numpages)
1602{
1603 int ret;
1604
1605 ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
1606 _PAGE_CACHE_MODE_WT, NULL);
1607 if (ret)
1608 return ret;
1609
1610 ret = _set_memory_wt(addr, numpages);
1611 if (ret)
1612 free_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE);
1613
1614 return ret;
1615}
1616EXPORT_SYMBOL_GPL(set_memory_wt);
1617
1595int _set_memory_wb(unsigned long addr, int numpages) 1618int _set_memory_wb(unsigned long addr, int numpages)
1596{ 1619{
1597 /* WB cache mode is hard wired to all cache attribute bits being 0 */ 1620 /* WB cache mode is hard wired to all cache attribute bits being 0 */
@@ -1682,6 +1705,7 @@ static int _set_pages_array(struct page **pages, int addrinarray,
1682{ 1705{
1683 unsigned long start; 1706 unsigned long start;
1684 unsigned long end; 1707 unsigned long end;
1708 enum page_cache_mode set_type;
1685 int i; 1709 int i;
1686 int free_idx; 1710 int free_idx;
1687 int ret; 1711 int ret;
@@ -1695,8 +1719,12 @@ static int _set_pages_array(struct page **pages, int addrinarray,
1695 goto err_out; 1719 goto err_out;
1696 } 1720 }
1697 1721
1722 /* If WC, set to UC- first and then WC */
1723 set_type = (new_type == _PAGE_CACHE_MODE_WC) ?
1724 _PAGE_CACHE_MODE_UC_MINUS : new_type;
1725
1698 ret = cpa_set_pages_array(pages, addrinarray, 1726 ret = cpa_set_pages_array(pages, addrinarray,
1699 cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS)); 1727 cachemode2pgprot(set_type));
1700 if (!ret && new_type == _PAGE_CACHE_MODE_WC) 1728 if (!ret && new_type == _PAGE_CACHE_MODE_WC)
1701 ret = change_page_attr_set_clr(NULL, addrinarray, 1729 ret = change_page_attr_set_clr(NULL, addrinarray,
1702 cachemode2pgprot( 1730 cachemode2pgprot(
@@ -1730,6 +1758,12 @@ int set_pages_array_wc(struct page **pages, int addrinarray)
1730} 1758}
1731EXPORT_SYMBOL(set_pages_array_wc); 1759EXPORT_SYMBOL(set_pages_array_wc);
1732 1760
1761int set_pages_array_wt(struct page **pages, int addrinarray)
1762{
1763 return _set_pages_array(pages, addrinarray, _PAGE_CACHE_MODE_WT);
1764}
1765EXPORT_SYMBOL_GPL(set_pages_array_wt);
1766
1733int set_pages_wb(struct page *page, int numpages) 1767int set_pages_wb(struct page *page, int numpages)
1734{ 1768{
1735 unsigned long addr = (unsigned long)page_address(page); 1769 unsigned long addr = (unsigned long)page_address(page);
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af6771a95a..188e3e07eeeb 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -33,13 +33,17 @@
33#include "pat_internal.h" 33#include "pat_internal.h"
34#include "mm_internal.h" 34#include "mm_internal.h"
35 35
36#ifdef CONFIG_X86_PAT 36#undef pr_fmt
37int __read_mostly pat_enabled = 1; 37#define pr_fmt(fmt) "" fmt
38
39static bool boot_cpu_done;
40
41static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
38 42
39static inline void pat_disable(const char *reason) 43static inline void pat_disable(const char *reason)
40{ 44{
41 pat_enabled = 0; 45 __pat_enabled = 0;
42 printk(KERN_INFO "%s\n", reason); 46 pr_info("x86/PAT: %s\n", reason);
43} 47}
44 48
45static int __init nopat(char *str) 49static int __init nopat(char *str)
@@ -48,13 +52,12 @@ static int __init nopat(char *str)
48 return 0; 52 return 0;
49} 53}
50early_param("nopat", nopat); 54early_param("nopat", nopat);
51#else 55
52static inline void pat_disable(const char *reason) 56bool pat_enabled(void)
53{ 57{
54 (void)reason; 58 return !!__pat_enabled;
55} 59}
56#endif 60EXPORT_SYMBOL_GPL(pat_enabled);
57
58 61
59int pat_debug_enable; 62int pat_debug_enable;
60 63
@@ -65,22 +68,24 @@ static int __init pat_debug_setup(char *str)
65} 68}
66__setup("debugpat", pat_debug_setup); 69__setup("debugpat", pat_debug_setup);
67 70
68static u64 __read_mostly boot_pat_state;
69
70#ifdef CONFIG_X86_PAT 71#ifdef CONFIG_X86_PAT
71/* 72/*
72 * X86 PAT uses page flags WC and Uncached together to keep track of 73 * X86 PAT uses page flags arch_1 and uncached together to keep track of
73 * memory type of pages that have backing page struct. X86 PAT supports 3 74 * memory type of pages that have backing page struct.
74 * different memory types, _PAGE_CACHE_MODE_WB, _PAGE_CACHE_MODE_WC and 75 *
75 * _PAGE_CACHE_MODE_UC_MINUS and fourth state where page's memory type has not 76 * X86 PAT supports 4 different memory types:
76 * been changed from its default (value of -1 used to denote this). 77 * - _PAGE_CACHE_MODE_WB
77 * Note we do not support _PAGE_CACHE_MODE_UC here. 78 * - _PAGE_CACHE_MODE_WC
79 * - _PAGE_CACHE_MODE_UC_MINUS
80 * - _PAGE_CACHE_MODE_WT
81 *
82 * _PAGE_CACHE_MODE_WB is the default type.
78 */ 83 */
79 84
80#define _PGMT_DEFAULT 0 85#define _PGMT_WB 0
81#define _PGMT_WC (1UL << PG_arch_1) 86#define _PGMT_WC (1UL << PG_arch_1)
82#define _PGMT_UC_MINUS (1UL << PG_uncached) 87#define _PGMT_UC_MINUS (1UL << PG_uncached)
83#define _PGMT_WB (1UL << PG_uncached | 1UL << PG_arch_1) 88#define _PGMT_WT (1UL << PG_uncached | 1UL << PG_arch_1)
84#define _PGMT_MASK (1UL << PG_uncached | 1UL << PG_arch_1) 89#define _PGMT_MASK (1UL << PG_uncached | 1UL << PG_arch_1)
85#define _PGMT_CLEAR_MASK (~_PGMT_MASK) 90#define _PGMT_CLEAR_MASK (~_PGMT_MASK)
86 91
@@ -88,14 +93,14 @@ static inline enum page_cache_mode get_page_memtype(struct page *pg)
88{ 93{
89 unsigned long pg_flags = pg->flags & _PGMT_MASK; 94 unsigned long pg_flags = pg->flags & _PGMT_MASK;
90 95
91 if (pg_flags == _PGMT_DEFAULT) 96 if (pg_flags == _PGMT_WB)
92 return -1; 97 return _PAGE_CACHE_MODE_WB;
93 else if (pg_flags == _PGMT_WC) 98 else if (pg_flags == _PGMT_WC)
94 return _PAGE_CACHE_MODE_WC; 99 return _PAGE_CACHE_MODE_WC;
95 else if (pg_flags == _PGMT_UC_MINUS) 100 else if (pg_flags == _PGMT_UC_MINUS)
96 return _PAGE_CACHE_MODE_UC_MINUS; 101 return _PAGE_CACHE_MODE_UC_MINUS;
97 else 102 else
98 return _PAGE_CACHE_MODE_WB; 103 return _PAGE_CACHE_MODE_WT;
99} 104}
100 105
101static inline void set_page_memtype(struct page *pg, 106static inline void set_page_memtype(struct page *pg,
@@ -112,11 +117,12 @@ static inline void set_page_memtype(struct page *pg,
112 case _PAGE_CACHE_MODE_UC_MINUS: 117 case _PAGE_CACHE_MODE_UC_MINUS:
113 memtype_flags = _PGMT_UC_MINUS; 118 memtype_flags = _PGMT_UC_MINUS;
114 break; 119 break;
115 case _PAGE_CACHE_MODE_WB: 120 case _PAGE_CACHE_MODE_WT:
116 memtype_flags = _PGMT_WB; 121 memtype_flags = _PGMT_WT;
117 break; 122 break;
123 case _PAGE_CACHE_MODE_WB:
118 default: 124 default:
119 memtype_flags = _PGMT_DEFAULT; 125 memtype_flags = _PGMT_WB;
120 break; 126 break;
121 } 127 }
122 128
@@ -174,78 +180,154 @@ static enum page_cache_mode pat_get_cache_mode(unsigned pat_val, char *msg)
174 * configuration. 180 * configuration.
175 * Using lower indices is preferred, so we start with highest index. 181 * Using lower indices is preferred, so we start with highest index.
176 */ 182 */
177void pat_init_cache_modes(void) 183void pat_init_cache_modes(u64 pat)
178{ 184{
179 int i;
180 enum page_cache_mode cache; 185 enum page_cache_mode cache;
181 char pat_msg[33]; 186 char pat_msg[33];
182 u64 pat; 187 int i;
183 188
184 rdmsrl(MSR_IA32_CR_PAT, pat);
185 pat_msg[32] = 0; 189 pat_msg[32] = 0;
186 for (i = 7; i >= 0; i--) { 190 for (i = 7; i >= 0; i--) {
187 cache = pat_get_cache_mode((pat >> (i * 8)) & 7, 191 cache = pat_get_cache_mode((pat >> (i * 8)) & 7,
188 pat_msg + 4 * i); 192 pat_msg + 4 * i);
189 update_cache_mode_entry(i, cache); 193 update_cache_mode_entry(i, cache);
190 } 194 }
191 pr_info("PAT configuration [0-7]: %s\n", pat_msg); 195 pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
192} 196}
193 197
194#define PAT(x, y) ((u64)PAT_ ## y << ((x)*8)) 198#define PAT(x, y) ((u64)PAT_ ## y << ((x)*8))
195 199
196void pat_init(void) 200static void pat_bsp_init(u64 pat)
197{ 201{
198 u64 pat; 202 u64 tmp_pat;
199 bool boot_cpu = !boot_pat_state;
200 203
201 if (!pat_enabled) 204 if (!cpu_has_pat) {
205 pat_disable("PAT not supported by CPU.");
202 return; 206 return;
207 }
203 208
204 if (!cpu_has_pat) { 209 if (!pat_enabled())
205 if (!boot_pat_state) { 210 goto done;
206 pat_disable("PAT not supported by CPU."); 211
207 return; 212 rdmsrl(MSR_IA32_CR_PAT, tmp_pat);
208 } else { 213 if (!tmp_pat) {
209 /* 214 pat_disable("PAT MSR is 0, disabled.");
210 * If this happens we are on a secondary CPU, but 215 return;
211 * switched to PAT on the boot CPU. We have no way to
212 * undo PAT.
213 */
214 printk(KERN_ERR "PAT enabled, "
215 "but not supported by secondary CPU\n");
216 BUG();
217 }
218 } 216 }
219 217
220 /* Set PWT to Write-Combining. All other bits stay the same */ 218 wrmsrl(MSR_IA32_CR_PAT, pat);
221 /* 219
222 * PTE encoding used in Linux: 220done:
223 * PAT 221 pat_init_cache_modes(pat);
224 * |PCD 222}
225 * ||PWT 223
226 * ||| 224static void pat_ap_init(u64 pat)
227 * 000 WB _PAGE_CACHE_WB 225{
228 * 001 WC _PAGE_CACHE_WC 226 if (!pat_enabled())
229 * 010 UC- _PAGE_CACHE_UC_MINUS 227 return;
230 * 011 UC _PAGE_CACHE_UC 228
231 * PAT bit unused 229 if (!cpu_has_pat) {
232 */ 230 /*
233 pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) | 231 * If this happens we are on a secondary CPU, but switched to
234 PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC); 232 * PAT on the boot CPU. We have no way to undo PAT.
235 233 */
236 /* Boot CPU check */ 234 panic("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
237 if (!boot_pat_state) {
238 rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
239 if (!boot_pat_state) {
240 pat_disable("PAT read returns always zero, disabled.");
241 return;
242 }
243 } 235 }
244 236
245 wrmsrl(MSR_IA32_CR_PAT, pat); 237 wrmsrl(MSR_IA32_CR_PAT, pat);
238}
239
240void pat_init(void)
241{
242 u64 pat;
243 struct cpuinfo_x86 *c = &boot_cpu_data;
244
245 if (!pat_enabled()) {
246 /*
247 * No PAT. Emulate the PAT table that corresponds to the two
248 * cache bits, PWT (Write Through) and PCD (Cache Disable). This
249 * setup is the same as the BIOS default setup when the system
250 * has PAT but the "nopat" boot option has been specified. This
251 * emulated PAT table is used when MSR_IA32_CR_PAT returns 0.
252 *
253 * PTE encoding:
254 *
255 * PCD
256 * |PWT PAT
257 * || slot
258 * 00 0 WB : _PAGE_CACHE_MODE_WB
259 * 01 1 WT : _PAGE_CACHE_MODE_WT
260 * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
261 * 11 3 UC : _PAGE_CACHE_MODE_UC
262 *
263 * NOTE: When WC or WP is used, it is redirected to UC- per
264 * the default setup in __cachemode2pte_tbl[].
265 */
266 pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
267 PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
246 268
247 if (boot_cpu) 269 } else if ((c->x86_vendor == X86_VENDOR_INTEL) &&
248 pat_init_cache_modes(); 270 (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
271 ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
272 /*
273 * PAT support with the lower four entries. Intel Pentium 2,
274 * 3, M, and 4 are affected by PAT errata, which makes the
275 * upper four entries unusable. To be on the safe side, we don't
276 * use those.
277 *
278 * PTE encoding:
279 * PAT
280 * |PCD
281 * ||PWT PAT
282 * ||| slot
283 * 000 0 WB : _PAGE_CACHE_MODE_WB
284 * 001 1 WC : _PAGE_CACHE_MODE_WC
285 * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
286 * 011 3 UC : _PAGE_CACHE_MODE_UC
287 * PAT bit unused
288 *
289 * NOTE: When WT or WP is used, it is redirected to UC- per
290 * the default setup in __cachemode2pte_tbl[].
291 */
292 pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
293 PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
294 } else {
295 /*
296 * Full PAT support. We put WT in slot 7 to improve
297 * robustness in the presence of errata that might cause
298 * the high PAT bit to be ignored. This way, a buggy slot 7
299 * access will hit slot 3, and slot 3 is UC, so at worst
300 * we lose performance without causing a correctness issue.
301 * Pentium 4 erratum N46 is an example for such an erratum,
302 * although we try not to use PAT at all on affected CPUs.
303 *
304 * PTE encoding:
305 * PAT
306 * |PCD
307 * ||PWT PAT
308 * ||| slot
309 * 000 0 WB : _PAGE_CACHE_MODE_WB
310 * 001 1 WC : _PAGE_CACHE_MODE_WC
311 * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
312 * 011 3 UC : _PAGE_CACHE_MODE_UC
313 * 100 4 WB : Reserved
314 * 101 5 WC : Reserved
315 * 110 6 UC-: Reserved
316 * 111 7 WT : _PAGE_CACHE_MODE_WT
317 *
318 * The reserved slots are unused, but mapped to their
319 * corresponding types in the presence of PAT errata.
320 */
321 pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
322 PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
323 }
324
325 if (!boot_cpu_done) {
326 pat_bsp_init(pat);
327 boot_cpu_done = true;
328 } else {
329 pat_ap_init(pat);
330 }
249} 331}
250 332
251#undef PAT 333#undef PAT
@@ -267,9 +349,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
267 * request is for WB. 349 * request is for WB.
268 */ 350 */
269 if (req_type == _PAGE_CACHE_MODE_WB) { 351 if (req_type == _PAGE_CACHE_MODE_WB) {
270 u8 mtrr_type; 352 u8 mtrr_type, uniform;
271 353
272 mtrr_type = mtrr_type_lookup(start, end); 354 mtrr_type = mtrr_type_lookup(start, end, &uniform);
273 if (mtrr_type != MTRR_TYPE_WRBACK) 355 if (mtrr_type != MTRR_TYPE_WRBACK)
274 return _PAGE_CACHE_MODE_UC_MINUS; 356 return _PAGE_CACHE_MODE_UC_MINUS;
275 357
@@ -324,9 +406,14 @@ static int pat_pagerange_is_ram(resource_size_t start, resource_size_t end)
324 406
325/* 407/*
326 * For RAM pages, we use page flags to mark the pages with appropriate type. 408 * For RAM pages, we use page flags to mark the pages with appropriate type.
327 * Here we do two pass: 409 * The page flags are limited to four types, WB (default), WC, WT and UC-.
328 * - Find the memtype of all the pages in the range, look for any conflicts 410 * WP request fails with -EINVAL, and UC gets redirected to UC-. Setting
329 * - In case of no conflicts, set the new memtype for pages in the range 411 * a new memory type is only allowed for a page mapped with the default WB
412 * type.
413 *
414 * Here we do two passes:
415 * - Find the memtype of all the pages in the range, look for any conflicts.
416 * - In case of no conflicts, set the new memtype for pages in the range.
330 */ 417 */
331static int reserve_ram_pages_type(u64 start, u64 end, 418static int reserve_ram_pages_type(u64 start, u64 end,
332 enum page_cache_mode req_type, 419 enum page_cache_mode req_type,
@@ -335,6 +422,12 @@ static int reserve_ram_pages_type(u64 start, u64 end,
335 struct page *page; 422 struct page *page;
336 u64 pfn; 423 u64 pfn;
337 424
425 if (req_type == _PAGE_CACHE_MODE_WP) {
426 if (new_type)
427 *new_type = _PAGE_CACHE_MODE_UC_MINUS;
428 return -EINVAL;
429 }
430
338 if (req_type == _PAGE_CACHE_MODE_UC) { 431 if (req_type == _PAGE_CACHE_MODE_UC) {
339 /* We do not support strong UC */ 432 /* We do not support strong UC */
340 WARN_ON_ONCE(1); 433 WARN_ON_ONCE(1);
@@ -346,8 +439,8 @@ static int reserve_ram_pages_type(u64 start, u64 end,
346 439
347 page = pfn_to_page(pfn); 440 page = pfn_to_page(pfn);
348 type = get_page_memtype(page); 441 type = get_page_memtype(page);
349 if (type != -1) { 442 if (type != _PAGE_CACHE_MODE_WB) {
350 pr_info("reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n", 443 pr_info("x86/PAT: reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
351 start, end - 1, type, req_type); 444 start, end - 1, type, req_type);
352 if (new_type) 445 if (new_type)
353 *new_type = type; 446 *new_type = type;
@@ -373,7 +466,7 @@ static int free_ram_pages_type(u64 start, u64 end)
373 466
374 for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) { 467 for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
375 page = pfn_to_page(pfn); 468 page = pfn_to_page(pfn);
376 set_page_memtype(page, -1); 469 set_page_memtype(page, _PAGE_CACHE_MODE_WB);
377 } 470 }
378 return 0; 471 return 0;
379} 472}
@@ -384,6 +477,7 @@ static int free_ram_pages_type(u64 start, u64 end)
384 * - _PAGE_CACHE_MODE_WC 477 * - _PAGE_CACHE_MODE_WC
385 * - _PAGE_CACHE_MODE_UC_MINUS 478 * - _PAGE_CACHE_MODE_UC_MINUS
386 * - _PAGE_CACHE_MODE_UC 479 * - _PAGE_CACHE_MODE_UC
480 * - _PAGE_CACHE_MODE_WT
387 * 481 *
388 * If new_type is NULL, function will return an error if it cannot reserve the 482 * If new_type is NULL, function will return an error if it cannot reserve the
389 * region with req_type. If new_type is non-NULL, function will return 483 * region with req_type. If new_type is non-NULL, function will return
@@ -400,14 +494,10 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
400 494
401 BUG_ON(start >= end); /* end is exclusive */ 495 BUG_ON(start >= end); /* end is exclusive */
402 496
403 if (!pat_enabled) { 497 if (!pat_enabled()) {
404 /* This is identical to page table setting without PAT */ 498 /* This is identical to page table setting without PAT */
405 if (new_type) { 499 if (new_type)
406 if (req_type == _PAGE_CACHE_MODE_WC) 500 *new_type = req_type;
407 *new_type = _PAGE_CACHE_MODE_UC_MINUS;
408 else
409 *new_type = req_type;
410 }
411 return 0; 501 return 0;
412 } 502 }
413 503
@@ -451,9 +541,9 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
451 541
452 err = rbt_memtype_check_insert(new, new_type); 542 err = rbt_memtype_check_insert(new, new_type);
453 if (err) { 543 if (err) {
454 printk(KERN_INFO "reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n", 544 pr_info("x86/PAT: reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
455 start, end - 1, 545 start, end - 1,
456 cattr_name(new->type), cattr_name(req_type)); 546 cattr_name(new->type), cattr_name(req_type));
457 kfree(new); 547 kfree(new);
458 spin_unlock(&memtype_lock); 548 spin_unlock(&memtype_lock);
459 549
@@ -475,7 +565,7 @@ int free_memtype(u64 start, u64 end)
475 int is_range_ram; 565 int is_range_ram;
476 struct memtype *entry; 566 struct memtype *entry;
477 567
478 if (!pat_enabled) 568 if (!pat_enabled())
479 return 0; 569 return 0;
480 570
481 /* Low ISA region is always mapped WB. No need to track */ 571 /* Low ISA region is always mapped WB. No need to track */
@@ -497,8 +587,8 @@ int free_memtype(u64 start, u64 end)
497 spin_unlock(&memtype_lock); 587 spin_unlock(&memtype_lock);
498 588
499 if (!entry) { 589 if (!entry) {
500 printk(KERN_INFO "%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n", 590 pr_info("x86/PAT: %s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
501 current->comm, current->pid, start, end - 1); 591 current->comm, current->pid, start, end - 1);
502 return -EINVAL; 592 return -EINVAL;
503 } 593 }
504 594
@@ -517,7 +607,7 @@ int free_memtype(u64 start, u64 end)
517 * Only to be called when PAT is enabled 607 * Only to be called when PAT is enabled
518 * 608 *
519 * Returns _PAGE_CACHE_MODE_WB, _PAGE_CACHE_MODE_WC, _PAGE_CACHE_MODE_UC_MINUS 609 * Returns _PAGE_CACHE_MODE_WB, _PAGE_CACHE_MODE_WC, _PAGE_CACHE_MODE_UC_MINUS
520 * or _PAGE_CACHE_MODE_UC 610 * or _PAGE_CACHE_MODE_WT.
521 */ 611 */
522static enum page_cache_mode lookup_memtype(u64 paddr) 612static enum page_cache_mode lookup_memtype(u64 paddr)
523{ 613{
@@ -529,16 +619,9 @@ static enum page_cache_mode lookup_memtype(u64 paddr)
529 619
530 if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) { 620 if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) {
531 struct page *page; 621 struct page *page;
532 page = pfn_to_page(paddr >> PAGE_SHIFT);
533 rettype = get_page_memtype(page);
534 /*
535 * -1 from get_page_memtype() implies RAM page is in its
536 * default state and not reserved, and hence of type WB
537 */
538 if (rettype == -1)
539 rettype = _PAGE_CACHE_MODE_WB;
540 622
541 return rettype; 623 page = pfn_to_page(paddr >> PAGE_SHIFT);
624 return get_page_memtype(page);
542 } 625 }
543 626
544 spin_lock(&memtype_lock); 627 spin_lock(&memtype_lock);
@@ -623,13 +706,13 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
623 u64 to = from + size; 706 u64 to = from + size;
624 u64 cursor = from; 707 u64 cursor = from;
625 708
626 if (!pat_enabled) 709 if (!pat_enabled())
627 return 1; 710 return 1;
628 711
629 while (cursor < to) { 712 while (cursor < to) {
630 if (!devmem_is_allowed(pfn)) { 713 if (!devmem_is_allowed(pfn)) {
631 printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n", 714 pr_info("x86/PAT: Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
632 current->comm, from, to - 1); 715 current->comm, from, to - 1);
633 return 0; 716 return 0;
634 } 717 }
635 cursor += PAGE_SIZE; 718 cursor += PAGE_SIZE;
@@ -659,7 +742,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
659 * caching for the high addresses through the KEN pin, but 742 * caching for the high addresses through the KEN pin, but
660 * we maintain the tradition of paranoia in this code. 743 * we maintain the tradition of paranoia in this code.
661 */ 744 */
662 if (!pat_enabled && 745 if (!pat_enabled() &&
663 !(boot_cpu_has(X86_FEATURE_MTRR) || 746 !(boot_cpu_has(X86_FEATURE_MTRR) ||
664 boot_cpu_has(X86_FEATURE_K6_MTRR) || 747 boot_cpu_has(X86_FEATURE_K6_MTRR) ||
665 boot_cpu_has(X86_FEATURE_CYRIX_ARR) || 748 boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
@@ -698,8 +781,7 @@ int kernel_map_sync_memtype(u64 base, unsigned long size,
698 size; 781 size;
699 782
700 if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) { 783 if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) {
701 printk(KERN_INFO "%s:%d ioremap_change_attr failed %s " 784 pr_info("x86/PAT: %s:%d ioremap_change_attr failed %s for [mem %#010Lx-%#010Lx]\n",
702 "for [mem %#010Lx-%#010Lx]\n",
703 current->comm, current->pid, 785 current->comm, current->pid,
704 cattr_name(pcm), 786 cattr_name(pcm),
705 base, (unsigned long long)(base + size-1)); 787 base, (unsigned long long)(base + size-1));
@@ -729,12 +811,12 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
729 * the type requested matches the type of first page in the range. 811 * the type requested matches the type of first page in the range.
730 */ 812 */
731 if (is_ram) { 813 if (is_ram) {
732 if (!pat_enabled) 814 if (!pat_enabled())
733 return 0; 815 return 0;
734 816
735 pcm = lookup_memtype(paddr); 817 pcm = lookup_memtype(paddr);
736 if (want_pcm != pcm) { 818 if (want_pcm != pcm) {
737 printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n", 819 pr_warn("x86/PAT: %s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
738 current->comm, current->pid, 820 current->comm, current->pid,
739 cattr_name(want_pcm), 821 cattr_name(want_pcm),
740 (unsigned long long)paddr, 822 (unsigned long long)paddr,
@@ -755,13 +837,12 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
755 if (strict_prot || 837 if (strict_prot ||
756 !is_new_memtype_allowed(paddr, size, want_pcm, pcm)) { 838 !is_new_memtype_allowed(paddr, size, want_pcm, pcm)) {
757 free_memtype(paddr, paddr + size); 839 free_memtype(paddr, paddr + size);
758 printk(KERN_ERR "%s:%d map pfn expected mapping type %s" 840 pr_err("x86/PAT: %s:%d map pfn expected mapping type %s for [mem %#010Lx-%#010Lx], got %s\n",
759 " for [mem %#010Lx-%#010Lx], got %s\n", 841 current->comm, current->pid,
760 current->comm, current->pid, 842 cattr_name(want_pcm),
761 cattr_name(want_pcm), 843 (unsigned long long)paddr,
762 (unsigned long long)paddr, 844 (unsigned long long)(paddr + size - 1),
763 (unsigned long long)(paddr + size - 1), 845 cattr_name(pcm));
764 cattr_name(pcm));
765 return -EINVAL; 846 return -EINVAL;
766 } 847 }
767 /* 848 /*
@@ -844,7 +925,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
844 return ret; 925 return ret;
845 } 926 }
846 927
847 if (!pat_enabled) 928 if (!pat_enabled())
848 return 0; 929 return 0;
849 930
850 /* 931 /*
@@ -872,7 +953,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
872{ 953{
873 enum page_cache_mode pcm; 954 enum page_cache_mode pcm;
874 955
875 if (!pat_enabled) 956 if (!pat_enabled())
876 return 0; 957 return 0;
877 958
878 /* Set prot based on lookup */ 959 /* Set prot based on lookup */
@@ -913,14 +994,18 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
913 994
914pgprot_t pgprot_writecombine(pgprot_t prot) 995pgprot_t pgprot_writecombine(pgprot_t prot)
915{ 996{
916 if (pat_enabled) 997 return __pgprot(pgprot_val(prot) |
917 return __pgprot(pgprot_val(prot) |
918 cachemode2protval(_PAGE_CACHE_MODE_WC)); 998 cachemode2protval(_PAGE_CACHE_MODE_WC));
919 else
920 return pgprot_noncached(prot);
921} 999}
922EXPORT_SYMBOL_GPL(pgprot_writecombine); 1000EXPORT_SYMBOL_GPL(pgprot_writecombine);
923 1001
1002pgprot_t pgprot_writethrough(pgprot_t prot)
1003{
1004 return __pgprot(pgprot_val(prot) |
1005 cachemode2protval(_PAGE_CACHE_MODE_WT));
1006}
1007EXPORT_SYMBOL_GPL(pgprot_writethrough);
1008
924#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT) 1009#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
925 1010
926static struct memtype *memtype_get_idx(loff_t pos) 1011static struct memtype *memtype_get_idx(loff_t pos)
@@ -996,7 +1081,7 @@ static const struct file_operations memtype_fops = {
996 1081
997static int __init pat_memtype_list_init(void) 1082static int __init pat_memtype_list_init(void)
998{ 1083{
999 if (pat_enabled) { 1084 if (pat_enabled()) {
1000 debugfs_create_file("pat_memtype_list", S_IRUSR, 1085 debugfs_create_file("pat_memtype_list", S_IRUSR,
1001 arch_debugfs_dir, NULL, &memtype_fops); 1086 arch_debugfs_dir, NULL, &memtype_fops);
1002 } 1087 }
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index f6411620305d..a739bfc40690 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -4,7 +4,7 @@
4extern int pat_debug_enable; 4extern int pat_debug_enable;
5 5
6#define dprintk(fmt, arg...) \ 6#define dprintk(fmt, arg...) \
7 do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0) 7 do { if (pat_debug_enable) pr_info("x86/PAT: " fmt, ##arg); } while (0)
8 8
9struct memtype { 9struct memtype {
10 u64 start; 10 u64 start;
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index 6582adcc8bd9..63931080366a 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -160,9 +160,9 @@ success:
160 return 0; 160 return 0;
161 161
162failure: 162failure:
163 printk(KERN_INFO "%s:%d conflicting memory types " 163 pr_info("x86/PAT: %s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
164 "%Lx-%Lx %s<->%s\n", current->comm, current->pid, start, 164 current->comm, current->pid, start, end,
165 end, cattr_name(found_type), cattr_name(match->type)); 165 cattr_name(found_type), cattr_name(match->type));
166 return -EBUSY; 166 return -EBUSY;
167} 167}
168 168
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c75df3..fb0a9dd1d6e4 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,31 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
563} 563}
564 564
565#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP 565#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
566/**
567 * pud_set_huge - setup kernel PUD mapping
568 *
569 * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
570 * function sets up a huge page only if any of the following conditions are met:
571 *
572 * - MTRRs are disabled, or
573 *
574 * - MTRRs are enabled and the range is completely covered by a single MTRR, or
575 *
576 * - MTRRs are enabled and the corresponding MTRR memory type is WB, which
577 * has no effect on the requested PAT memory type.
578 *
579 * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
580 * page mapping attempt fails.
581 *
582 * Returns 1 on success and 0 on failure.
583 */
566int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) 584int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
567{ 585{
568 u8 mtrr; 586 u8 mtrr, uniform;
569 587
570 /* 588 mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
571 * Do not use a huge page when the range is covered by non-WB type 589 if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
572 * of MTRRs. 590 (mtrr != MTRR_TYPE_WRBACK))
573 */
574 mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
575 if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
576 return 0; 591 return 0;
577 592
578 prot = pgprot_4k_2_large(prot); 593 prot = pgprot_4k_2_large(prot);
@@ -584,17 +599,24 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
584 return 1; 599 return 1;
585} 600}
586 601
602/**
603 * pmd_set_huge - setup kernel PMD mapping
604 *
605 * See text over pud_set_huge() above.
606 *
607 * Returns 1 on success and 0 on failure.
608 */
587int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) 609int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
588{ 610{
589 u8 mtrr; 611 u8 mtrr, uniform;
590 612
591 /* 613 mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
592 * Do not use a huge page when the range is covered by non-WB type 614 if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
593 * of MTRRs. 615 (mtrr != MTRR_TYPE_WRBACK)) {
594 */ 616 pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
595 mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE); 617 __func__, addr, addr + PMD_SIZE);
596 if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
597 return 0; 618 return 0;
619 }
598 620
599 prot = pgprot_4k_2_large(prot); 621 prot = pgprot_4k_2_large(prot);
600 622
@@ -605,6 +627,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
605 return 1; 627 return 1;
606} 628}
607 629
630/**
631 * pud_clear_huge - clear kernel PUD mapping when it is set
632 *
633 * Returns 1 on success and 0 on failure (no PUD map is found).
634 */
608int pud_clear_huge(pud_t *pud) 635int pud_clear_huge(pud_t *pud)
609{ 636{
610 if (pud_large(*pud)) { 637 if (pud_large(*pud)) {
@@ -615,6 +642,11 @@ int pud_clear_huge(pud_t *pud)
615 return 0; 642 return 0;
616} 643}
617 644
645/**
646 * pmd_clear_huge - clear kernel PMD mapping when it is set
647 *
648 * Returns 1 on success and 0 on failure (no PMD map is found).
649 */
618int pmd_clear_huge(pmd_t *pmd) 650int pmd_clear_huge(pmd_t *pmd)
619{ 651{
620 if (pmd_large(*pmd)) { 652 if (pmd_large(*pmd)) {
diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
index 6440221ced0d..4093216b3791 100644
--- a/arch/x86/net/bpf_jit.S
+++ b/arch/x86/net/bpf_jit.S
@@ -8,7 +8,6 @@
8 * of the License. 8 * of the License.
9 */ 9 */
10#include <linux/linkage.h> 10#include <linux/linkage.h>
11#include <asm/dwarf2.h>
12 11
13/* 12/*
14 * Calling convention : 13 * Calling convention :
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 349c0d32cc0b..0a9f2caf358f 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -429,12 +429,12 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
429 * Caller can followup with UC MINUS request and add a WC mtrr if there 429 * Caller can followup with UC MINUS request and add a WC mtrr if there
430 * is a free mtrr slot. 430 * is a free mtrr slot.
431 */ 431 */
432 if (!pat_enabled && write_combine) 432 if (!pat_enabled() && write_combine)
433 return -EINVAL; 433 return -EINVAL;
434 434
435 if (pat_enabled && write_combine) 435 if (pat_enabled() && write_combine)
436 prot |= cachemode2protval(_PAGE_CACHE_MODE_WC); 436 prot |= cachemode2protval(_PAGE_CACHE_MODE_WC);
437 else if (pat_enabled || boot_cpu_data.x86 > 3) 437 else if (pat_enabled() || boot_cpu_data.x86 > 3)
438 /* 438 /*
439 * ioremap() and ioremap_nocache() defaults to UC MINUS for now. 439 * ioremap() and ioremap_nocache() defaults to UC MINUS for now.
440 * To avoid attribute conflicts, request UC MINUS here 440 * To avoid attribute conflicts, request UC MINUS here
diff --git a/arch/x86/pci/intel_mid_pci.c b/arch/x86/pci/intel_mid_pci.c
index 852aa4c92da0..27062303c881 100644
--- a/arch/x86/pci/intel_mid_pci.c
+++ b/arch/x86/pci/intel_mid_pci.c
@@ -208,6 +208,7 @@ static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
208 208
209static int intel_mid_pci_irq_enable(struct pci_dev *dev) 209static int intel_mid_pci_irq_enable(struct pci_dev *dev)
210{ 210{
211 struct irq_alloc_info info;
211 int polarity; 212 int polarity;
212 213
213 if (dev->irq_managed && dev->irq > 0) 214 if (dev->irq_managed && dev->irq > 0)
@@ -217,14 +218,13 @@ static int intel_mid_pci_irq_enable(struct pci_dev *dev)
217 polarity = 0; /* active high */ 218 polarity = 0; /* active high */
218 else 219 else
219 polarity = 1; /* active low */ 220 polarity = 1; /* active low */
221 ioapic_set_alloc_attr(&info, dev_to_node(&dev->dev), 1, polarity);
220 222
221 /* 223 /*
222 * MRST only have IOAPIC, the PCI irq lines are 1:1 mapped to 224 * MRST only have IOAPIC, the PCI irq lines are 1:1 mapped to
223 * IOAPIC RTE entries, so we just enable RTE for the device. 225 * IOAPIC RTE entries, so we just enable RTE for the device.
224 */ 226 */
225 if (mp_set_gsi_attr(dev->irq, 1, polarity, dev_to_node(&dev->dev))) 227 if (mp_map_gsi_to_irq(dev->irq, IOAPIC_MAP_ALLOC, &info) < 0)
226 return -EBUSY;
227 if (mp_map_gsi_to_irq(dev->irq, IOAPIC_MAP_ALLOC) < 0)
228 return -EBUSY; 228 return -EBUSY;
229 229
230 dev->irq_managed = 1; 230 dev->irq_managed = 1;
diff --git a/arch/x86/pci/irq.c b/arch/x86/pci/irq.c
index 5dc6ca5e1741..9bd115484745 100644
--- a/arch/x86/pci/irq.c
+++ b/arch/x86/pci/irq.c
@@ -146,19 +146,20 @@ static void __init pirq_peer_trick(void)
146 146
147/* 147/*
148 * Code for querying and setting of IRQ routes on various interrupt routers. 148 * Code for querying and setting of IRQ routes on various interrupt routers.
149 * PIC Edge/Level Control Registers (ELCR) 0x4d0 & 0x4d1.
149 */ 150 */
150 151
151void eisa_set_level_irq(unsigned int irq) 152void elcr_set_level_irq(unsigned int irq)
152{ 153{
153 unsigned char mask = 1 << (irq & 7); 154 unsigned char mask = 1 << (irq & 7);
154 unsigned int port = 0x4d0 + (irq >> 3); 155 unsigned int port = 0x4d0 + (irq >> 3);
155 unsigned char val; 156 unsigned char val;
156 static u16 eisa_irq_mask; 157 static u16 elcr_irq_mask;
157 158
158 if (irq >= 16 || (1 << irq) & eisa_irq_mask) 159 if (irq >= 16 || (1 << irq) & elcr_irq_mask)
159 return; 160 return;
160 161
161 eisa_irq_mask |= (1 << irq); 162 elcr_irq_mask |= (1 << irq);
162 printk(KERN_DEBUG "PCI: setting IRQ %u as level-triggered\n", irq); 163 printk(KERN_DEBUG "PCI: setting IRQ %u as level-triggered\n", irq);
163 val = inb(port); 164 val = inb(port);
164 if (!(val & mask)) { 165 if (!(val & mask)) {
@@ -965,11 +966,11 @@ static int pcibios_lookup_irq(struct pci_dev *dev, int assign)
965 } else if (r->get && (irq = r->get(pirq_router_dev, dev, pirq)) && \ 966 } else if (r->get && (irq = r->get(pirq_router_dev, dev, pirq)) && \
966 ((!(pci_probe & PCI_USE_PIRQ_MASK)) || ((1 << irq) & mask))) { 967 ((!(pci_probe & PCI_USE_PIRQ_MASK)) || ((1 << irq) & mask))) {
967 msg = "found"; 968 msg = "found";
968 eisa_set_level_irq(irq); 969 elcr_set_level_irq(irq);
969 } else if (newirq && r->set && 970 } else if (newirq && r->set &&
970 (dev->class >> 8) != PCI_CLASS_DISPLAY_VGA) { 971 (dev->class >> 8) != PCI_CLASS_DISPLAY_VGA) {
971 if (r->set(pirq_router_dev, dev, pirq, newirq)) { 972 if (r->set(pirq_router_dev, dev, pirq, newirq)) {
972 eisa_set_level_irq(newirq); 973 elcr_set_level_irq(newirq);
973 msg = "assigned"; 974 msg = "assigned";
974 irq = newirq; 975 irq = newirq;
975 } 976 }
diff --git a/arch/x86/platform/Makefile b/arch/x86/platform/Makefile
index a62e0be3a2f1..f1a6c8e86ddd 100644
--- a/arch/x86/platform/Makefile
+++ b/arch/x86/platform/Makefile
@@ -1,4 +1,5 @@
1# Platform specific code goes here 1# Platform specific code goes here
2obj-y += atom/
2obj-y += ce4100/ 3obj-y += ce4100/
3obj-y += efi/ 4obj-y += efi/
4obj-y += geode/ 5obj-y += geode/
diff --git a/arch/x86/platform/atom/Makefile b/arch/x86/platform/atom/Makefile
new file mode 100644
index 000000000000..0a3a40cbc794
--- /dev/null
+++ b/arch/x86/platform/atom/Makefile
@@ -0,0 +1 @@
obj-$(CONFIG_PUNIT_ATOM_DEBUG) += punit_atom_debug.o
diff --git a/arch/x86/platform/atom/punit_atom_debug.c b/arch/x86/platform/atom/punit_atom_debug.c
new file mode 100644
index 000000000000..5ca8ead91579
--- /dev/null
+++ b/arch/x86/platform/atom/punit_atom_debug.c
@@ -0,0 +1,183 @@
1/*
2 * Intel SOC Punit device state debug driver
3 * Punit controls power management for North Complex devices (Graphics
4 * blocks, Image Signal Processing, video processing, display, DSP etc.)
5 *
6 * Copyright (c) 2015, Intel Corporation.
7 *
8 * This program is free software; you can redistribute it and/or modify it
9 * under the terms and conditions of the GNU General Public License,
10 * version 2, as published by the Free Software Foundation.
11 *
12 * This program is distributed in the hope it will be useful, but WITHOUT
13 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
14 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
15 * more details.
16 *
17 */
18
19#include <linux/module.h>
20#include <linux/init.h>
21#include <linux/device.h>
22#include <linux/debugfs.h>
23#include <linux/seq_file.h>
24#include <linux/io.h>
25#include <asm/cpu_device_id.h>
26#include <asm/iosf_mbi.h>
27
28/* Side band Interface port */
29#define PUNIT_PORT 0x04
30/* Power gate status reg */
31#define PWRGT_STATUS 0x61
32/* Subsystem config/status Video processor */
33#define VED_SS_PM0 0x32
34/* Subsystem config/status ISP (Image Signal Processor) */
35#define ISP_SS_PM0 0x39
36/* Subsystem config/status Input/output controller */
37#define MIO_SS_PM 0x3B
38/* Shift bits for getting status for video, isp and i/o */
39#define SSS_SHIFT 24
40/* Shift bits for getting status for graphics rendering */
41#define RENDER_POS 0
42/* Shift bits for getting status for media control */
43#define MEDIA_POS 2
44/* Shift bits for getting status for Valley View/Baytrail display */
45#define VLV_DISPLAY_POS 6
46/* Subsystem config/status display for Cherry Trail SOC */
47#define CHT_DSP_SSS 0x36
48/* Shift bits for getting status for display */
49#define CHT_DSP_SSS_POS 16
50
51struct punit_device {
52 char *name;
53 int reg;
54 int sss_pos;
55};
56
57static const struct punit_device punit_device_byt[] = {
58 { "GFX RENDER", PWRGT_STATUS, RENDER_POS },
59 { "GFX MEDIA", PWRGT_STATUS, MEDIA_POS },
60 { "DISPLAY", PWRGT_STATUS, VLV_DISPLAY_POS },
61 { "VED", VED_SS_PM0, SSS_SHIFT },
62 { "ISP", ISP_SS_PM0, SSS_SHIFT },
63 { "MIO", MIO_SS_PM, SSS_SHIFT },
64 { NULL }
65};
66
67static const struct punit_device punit_device_cht[] = {
68 { "GFX RENDER", PWRGT_STATUS, RENDER_POS },
69 { "GFX MEDIA", PWRGT_STATUS, MEDIA_POS },
70 { "DISPLAY", CHT_DSP_SSS, CHT_DSP_SSS_POS },
71 { "VED", VED_SS_PM0, SSS_SHIFT },
72 { "ISP", ISP_SS_PM0, SSS_SHIFT },
73 { "MIO", MIO_SS_PM, SSS_SHIFT },
74 { NULL }
75};
76
77static const char * const dstates[] = {"D0", "D0i1", "D0i2", "D0i3"};
78
79static int punit_dev_state_show(struct seq_file *seq_file, void *unused)
80{
81 u32 punit_pwr_status;
82 struct punit_device *punit_devp = seq_file->private;
83 int index;
84 int status;
85
86 seq_puts(seq_file, "\n\nPUNIT NORTH COMPLEX DEVICES :\n");
87 while (punit_devp->name) {
88 status = iosf_mbi_read(PUNIT_PORT, BT_MBI_PMC_READ,
89 punit_devp->reg,
90 &punit_pwr_status);
91 if (status) {
92 seq_printf(seq_file, "%9s : Read Failed\n",
93 punit_devp->name);
94 } else {
95 index = (punit_pwr_status >> punit_devp->sss_pos) & 3;
96 seq_printf(seq_file, "%9s : %s\n", punit_devp->name,
97 dstates[index]);
98 }
99 punit_devp++;
100 }
101
102 return 0;
103}
104
105static int punit_dev_state_open(struct inode *inode, struct file *file)
106{
107 return single_open(file, punit_dev_state_show, inode->i_private);
108}
109
110static const struct file_operations punit_dev_state_ops = {
111 .open = punit_dev_state_open,
112 .read = seq_read,
113 .llseek = seq_lseek,
114 .release = single_release,
115};
116
117static struct dentry *punit_dbg_file;
118
119static int punit_dbgfs_register(struct punit_device *punit_device)
120{
121 static struct dentry *dev_state;
122
123 punit_dbg_file = debugfs_create_dir("punit_atom", NULL);
124 if (!punit_dbg_file)
125 return -ENXIO;
126
127 dev_state = debugfs_create_file("dev_power_state", S_IFREG | S_IRUGO,
128 punit_dbg_file, punit_device,
129 &punit_dev_state_ops);
130 if (!dev_state) {
131 pr_err("punit_dev_state register failed\n");
132 debugfs_remove(punit_dbg_file);
133 return -ENXIO;
134 }
135
136 return 0;
137}
138
139static void punit_dbgfs_unregister(void)
140{
141 debugfs_remove_recursive(punit_dbg_file);
142}
143
144#define ICPU(model, drv_data) \
145 { X86_VENDOR_INTEL, 6, model, X86_FEATURE_MWAIT,\
146 (kernel_ulong_t)&drv_data }
147
148static const struct x86_cpu_id intel_punit_cpu_ids[] = {
149 ICPU(55, punit_device_byt), /* Valleyview, Bay Trail */
150 ICPU(76, punit_device_cht), /* Braswell, Cherry Trail */
151 {}
152};
153
154MODULE_DEVICE_TABLE(x86cpu, intel_punit_cpu_ids);
155
156static int __init punit_atom_debug_init(void)
157{
158 const struct x86_cpu_id *id;
159 int ret;
160
161 id = x86_match_cpu(intel_punit_cpu_ids);
162 if (!id)
163 return -ENODEV;
164
165 ret = punit_dbgfs_register((struct punit_device *)id->driver_data);
166 if (ret < 0)
167 return ret;
168
169 return 0;
170}
171
172static void __exit punit_atom_debug_exit(void)
173{
174 punit_dbgfs_unregister();
175}
176
177module_init(punit_atom_debug_init);
178module_exit(punit_atom_debug_exit);
179
180MODULE_AUTHOR("Kumar P, Mahesh <mahesh.kumar.p@intel.com>");
181MODULE_AUTHOR("Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>");
182MODULE_DESCRIPTION("Driver for Punit devices states debugging");
183MODULE_LICENSE("GPL v2");
diff --git a/arch/x86/platform/intel-mid/device_libs/platform_wdt.c b/arch/x86/platform/intel-mid/device_libs/platform_wdt.c
index 0b283d4d0ad7..de734134bc8d 100644
--- a/arch/x86/platform/intel-mid/device_libs/platform_wdt.c
+++ b/arch/x86/platform/intel-mid/device_libs/platform_wdt.c
@@ -27,6 +27,7 @@ static struct platform_device wdt_dev = {
27static int tangier_probe(struct platform_device *pdev) 27static int tangier_probe(struct platform_device *pdev)
28{ 28{
29 int gsi; 29 int gsi;
30 struct irq_alloc_info info;
30 struct intel_mid_wdt_pdata *pdata = pdev->dev.platform_data; 31 struct intel_mid_wdt_pdata *pdata = pdev->dev.platform_data;
31 32
32 if (!pdata) 33 if (!pdata)
@@ -34,8 +35,8 @@ static int tangier_probe(struct platform_device *pdev)
34 35
35 /* IOAPIC builds identity mapping between GSI and IRQ on MID */ 36 /* IOAPIC builds identity mapping between GSI and IRQ on MID */
36 gsi = pdata->irq; 37 gsi = pdata->irq;
37 if (mp_set_gsi_attr(gsi, 1, 0, cpu_to_node(0)) || 38 ioapic_set_alloc_attr(&info, cpu_to_node(0), 1, 0);
38 mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC) <= 0) { 39 if (mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC, &info) <= 0) {
39 dev_warn(&pdev->dev, "cannot find interrupt %d in ioapic\n", 40 dev_warn(&pdev->dev, "cannot find interrupt %d in ioapic\n",
40 gsi); 41 gsi);
41 return -EINVAL; 42 return -EINVAL;
diff --git a/arch/x86/platform/intel-mid/intel-mid.c b/arch/x86/platform/intel-mid/intel-mid.c
index 3005f0c89f2e..01d54ea766c1 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -81,26 +81,34 @@ static unsigned long __init intel_mid_calibrate_tsc(void)
81 return 0; 81 return 0;
82} 82}
83 83
84static void __init intel_mid_setup_bp_timer(void)
85{
86 apbt_time_init();
87 setup_boot_APIC_clock();
88}
89
84static void __init intel_mid_time_init(void) 90static void __init intel_mid_time_init(void)
85{ 91{
86 sfi_table_parse(SFI_SIG_MTMR, NULL, NULL, sfi_parse_mtmr); 92 sfi_table_parse(SFI_SIG_MTMR, NULL, NULL, sfi_parse_mtmr);
93
87 switch (intel_mid_timer_options) { 94 switch (intel_mid_timer_options) {
88 case INTEL_MID_TIMER_APBT_ONLY: 95 case INTEL_MID_TIMER_APBT_ONLY:
89 break; 96 break;
90 case INTEL_MID_TIMER_LAPIC_APBT: 97 case INTEL_MID_TIMER_LAPIC_APBT:
91 x86_init.timers.setup_percpu_clockev = setup_boot_APIC_clock; 98 /* Use apbt and local apic */
99 x86_init.timers.setup_percpu_clockev = intel_mid_setup_bp_timer;
92 x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock; 100 x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock;
93 break; 101 return;
94 default: 102 default:
95 if (!boot_cpu_has(X86_FEATURE_ARAT)) 103 if (!boot_cpu_has(X86_FEATURE_ARAT))
96 break; 104 break;
105 /* Lapic only, no apbt */
97 x86_init.timers.setup_percpu_clockev = setup_boot_APIC_clock; 106 x86_init.timers.setup_percpu_clockev = setup_boot_APIC_clock;
98 x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock; 107 x86_cpuinit.setup_percpu_clockev = setup_secondary_APIC_clock;
99 return; 108 return;
100 } 109 }
101 /* we need at least one APB timer */ 110
102 pre_init_apic_IRQ0(); 111 x86_init.timers.setup_percpu_clockev = apbt_time_init;
103 apbt_time_init();
104} 112}
105 113
106static void intel_mid_arch_setup(void) 114static void intel_mid_arch_setup(void)
diff --git a/arch/x86/platform/intel-mid/sfi.c b/arch/x86/platform/intel-mid/sfi.c
index c14ad34776c4..ce992e8cc065 100644
--- a/arch/x86/platform/intel-mid/sfi.c
+++ b/arch/x86/platform/intel-mid/sfi.c
@@ -95,18 +95,16 @@ int __init sfi_parse_mtmr(struct sfi_table_header *table)
95 pr_debug("timer[%d]: paddr = 0x%08x, freq = %dHz, irq = %d\n", 95 pr_debug("timer[%d]: paddr = 0x%08x, freq = %dHz, irq = %d\n",
96 totallen, (u32)pentry->phys_addr, 96 totallen, (u32)pentry->phys_addr,
97 pentry->freq_hz, pentry->irq); 97 pentry->freq_hz, pentry->irq);
98 if (!pentry->irq) 98 mp_irq.type = MP_INTSRC;
99 continue; 99 mp_irq.irqtype = mp_INT;
100 mp_irq.type = MP_INTSRC; 100 /* triggering mode edge bit 2-3, active high polarity bit 0-1 */
101 mp_irq.irqtype = mp_INT; 101 mp_irq.irqflag = 5;
102/* triggering mode edge bit 2-3, active high polarity bit 0-1 */ 102 mp_irq.srcbus = MP_BUS_ISA;
103 mp_irq.irqflag = 5; 103 mp_irq.srcbusirq = pentry->irq; /* IRQ */
104 mp_irq.srcbus = MP_BUS_ISA; 104 mp_irq.dstapic = MP_APIC_ALL;
105 mp_irq.srcbusirq = pentry->irq; /* IRQ */ 105 mp_irq.dstirq = pentry->irq;
106 mp_irq.dstapic = MP_APIC_ALL; 106 mp_save_irq(&mp_irq);
107 mp_irq.dstirq = pentry->irq; 107 mp_map_gsi_to_irq(pentry->irq, IOAPIC_MAP_ALLOC, NULL);
108 mp_save_irq(&mp_irq);
109 mp_map_gsi_to_irq(pentry->irq, IOAPIC_MAP_ALLOC);
110 } 108 }
111 109
112 return 0; 110 return 0;
@@ -177,7 +175,7 @@ int __init sfi_parse_mrtc(struct sfi_table_header *table)
177 mp_irq.dstapic = MP_APIC_ALL; 175 mp_irq.dstapic = MP_APIC_ALL;
178 mp_irq.dstirq = pentry->irq; 176 mp_irq.dstirq = pentry->irq;
179 mp_save_irq(&mp_irq); 177 mp_save_irq(&mp_irq);
180 mp_map_gsi_to_irq(pentry->irq, IOAPIC_MAP_ALLOC); 178 mp_map_gsi_to_irq(pentry->irq, IOAPIC_MAP_ALLOC, NULL);
181 } 179 }
182 return 0; 180 return 0;
183} 181}
@@ -436,6 +434,7 @@ static int __init sfi_parse_devs(struct sfi_table_header *table)
436 struct devs_id *dev = NULL; 434 struct devs_id *dev = NULL;
437 int num, i, ret; 435 int num, i, ret;
438 int polarity; 436 int polarity;
437 struct irq_alloc_info info;
439 438
440 sb = (struct sfi_table_simple *)table; 439 sb = (struct sfi_table_simple *)table;
441 num = SFI_GET_NUM_ENTRIES(sb, struct sfi_device_table_entry); 440 num = SFI_GET_NUM_ENTRIES(sb, struct sfi_device_table_entry);
@@ -469,9 +468,8 @@ static int __init sfi_parse_devs(struct sfi_table_header *table)
469 polarity = 1; 468 polarity = 1;
470 } 469 }
471 470
472 ret = mp_set_gsi_attr(irq, 1, polarity, NUMA_NO_NODE); 471 ioapic_set_alloc_attr(&info, NUMA_NO_NODE, 1, polarity);
473 if (ret == 0) 472 ret = mp_map_gsi_to_irq(irq, IOAPIC_MAP_ALLOC, &info);
474 ret = mp_map_gsi_to_irq(irq, IOAPIC_MAP_ALLOC);
475 WARN_ON(ret < 0); 473 WARN_ON(ret < 0);
476 } 474 }
477 475
diff --git a/arch/x86/platform/sfi/sfi.c b/arch/x86/platform/sfi/sfi.c
index 2a8a74f3bd76..6c7111bbd1e9 100644
--- a/arch/x86/platform/sfi/sfi.c
+++ b/arch/x86/platform/sfi/sfi.c
@@ -25,8 +25,8 @@
25#include <linux/init.h> 25#include <linux/init.h>
26#include <linux/sfi.h> 26#include <linux/sfi.h>
27#include <linux/io.h> 27#include <linux/io.h>
28#include <linux/irqdomain.h>
29 28
29#include <asm/irqdomain.h>
30#include <asm/io_apic.h> 30#include <asm/io_apic.h>
31#include <asm/mpspec.h> 31#include <asm/mpspec.h>
32#include <asm/setup.h> 32#include <asm/setup.h>
@@ -71,9 +71,6 @@ static int __init sfi_parse_cpus(struct sfi_table_header *table)
71#endif /* CONFIG_X86_LOCAL_APIC */ 71#endif /* CONFIG_X86_LOCAL_APIC */
72 72
73#ifdef CONFIG_X86_IO_APIC 73#ifdef CONFIG_X86_IO_APIC
74static struct irq_domain_ops sfi_ioapic_irqdomain_ops = {
75 .map = mp_irqdomain_map,
76};
77 74
78static int __init sfi_parse_ioapic(struct sfi_table_header *table) 75static int __init sfi_parse_ioapic(struct sfi_table_header *table)
79{ 76{
@@ -82,7 +79,7 @@ static int __init sfi_parse_ioapic(struct sfi_table_header *table)
82 int i, num; 79 int i, num;
83 struct ioapic_domain_cfg cfg = { 80 struct ioapic_domain_cfg cfg = {
84 .type = IOAPIC_DOMAIN_STRICT, 81 .type = IOAPIC_DOMAIN_STRICT,
85 .ops = &sfi_ioapic_irqdomain_ops, 82 .ops = &mp_ioapic_irqdomain_ops,
86 }; 83 };
87 84
88 sb = (struct sfi_table_simple *)table; 85 sb = (struct sfi_table_simple *)table;
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index 0ce673645432..8570abe68be1 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -13,22 +13,37 @@
13#include <linux/slab.h> 13#include <linux/slab.h>
14#include <linux/irq.h> 14#include <linux/irq.h>
15 15
16#include <asm/irqdomain.h>
16#include <asm/apic.h> 17#include <asm/apic.h>
17#include <asm/uv/uv_irq.h> 18#include <asm/uv/uv_irq.h>
18#include <asm/uv/uv_hub.h> 19#include <asm/uv/uv_hub.h>
19 20
20/* MMR offset and pnode of hub sourcing interrupts for a given irq */ 21/* MMR offset and pnode of hub sourcing interrupts for a given irq */
21struct uv_irq_2_mmr_pnode{ 22struct uv_irq_2_mmr_pnode {
22 struct rb_node list;
23 unsigned long offset; 23 unsigned long offset;
24 int pnode; 24 int pnode;
25 int irq;
26}; 25};
27 26
28static DEFINE_SPINLOCK(uv_irq_lock); 27static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
29static struct rb_root uv_irq_root; 28{
29 unsigned long mmr_value;
30 struct uv_IO_APIC_route_entry *entry;
31
32 BUILD_BUG_ON(sizeof(struct uv_IO_APIC_route_entry) !=
33 sizeof(unsigned long));
34
35 mmr_value = 0;
36 entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
37 entry->vector = cfg->vector;
38 entry->delivery_mode = apic->irq_delivery_mode;
39 entry->dest_mode = apic->irq_dest_mode;
40 entry->polarity = 0;
41 entry->trigger = 0;
42 entry->mask = 0;
43 entry->dest = cfg->dest_apicid;
30 44
31static int uv_set_irq_affinity(struct irq_data *, const struct cpumask *, bool); 45 uv_write_global_mmr64(info->pnode, info->offset, mmr_value);
46}
32 47
33static void uv_noop(struct irq_data *data) { } 48static void uv_noop(struct irq_data *data) { }
34 49
@@ -37,6 +52,23 @@ static void uv_ack_apic(struct irq_data *data)
37 ack_APIC_irq(); 52 ack_APIC_irq();
38} 53}
39 54
55static int
56uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask,
57 bool force)
58{
59 struct irq_data *parent = data->parent_data;
60 struct irq_cfg *cfg = irqd_cfg(data);
61 int ret;
62
63 ret = parent->chip->irq_set_affinity(parent, mask, force);
64 if (ret >= 0) {
65 uv_program_mmr(cfg, data->chip_data);
66 send_cleanup_vector(cfg);
67 }
68
69 return ret;
70}
71
40static struct irq_chip uv_irq_chip = { 72static struct irq_chip uv_irq_chip = {
41 .name = "UV-CORE", 73 .name = "UV-CORE",
42 .irq_mask = uv_noop, 74 .irq_mask = uv_noop,
@@ -45,189 +77,99 @@ static struct irq_chip uv_irq_chip = {
45 .irq_set_affinity = uv_set_irq_affinity, 77 .irq_set_affinity = uv_set_irq_affinity,
46}; 78};
47 79
48/* 80static int uv_domain_alloc(struct irq_domain *domain, unsigned int virq,
49 * Add offset and pnode information of the hub sourcing interrupts to the 81 unsigned int nr_irqs, void *arg)
50 * rb tree for a specific irq.
51 */
52static int uv_set_irq_2_mmr_info(int irq, unsigned long offset, unsigned blade)
53{ 82{
54 struct rb_node **link = &uv_irq_root.rb_node; 83 struct uv_irq_2_mmr_pnode *chip_data;
55 struct rb_node *parent = NULL; 84 struct irq_alloc_info *info = arg;
56 struct uv_irq_2_mmr_pnode *n; 85 struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
57 struct uv_irq_2_mmr_pnode *e; 86 int ret;
58 unsigned long irqflags; 87
59 88 if (nr_irqs > 1 || !info || info->type != X86_IRQ_ALLOC_TYPE_UV)
60 n = kmalloc_node(sizeof(struct uv_irq_2_mmr_pnode), GFP_KERNEL, 89 return -EINVAL;
61 uv_blade_to_memory_nid(blade)); 90
62 if (!n) 91 chip_data = kmalloc_node(sizeof(*chip_data), GFP_KERNEL,
92 irq_data->node);
93 if (!chip_data)
63 return -ENOMEM; 94 return -ENOMEM;
64 95
65 n->irq = irq; 96 ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
66 n->offset = offset; 97 if (ret >= 0) {
67 n->pnode = uv_blade_to_pnode(blade); 98 if (info->uv_limit == UV_AFFINITY_CPU)
68 spin_lock_irqsave(&uv_irq_lock, irqflags); 99 irq_set_status_flags(virq, IRQ_NO_BALANCING);
69 /* Find the right place in the rbtree: */
70 while (*link) {
71 parent = *link;
72 e = rb_entry(parent, struct uv_irq_2_mmr_pnode, list);
73
74 if (unlikely(irq == e->irq)) {
75 /* irq entry exists */
76 e->pnode = uv_blade_to_pnode(blade);
77 e->offset = offset;
78 spin_unlock_irqrestore(&uv_irq_lock, irqflags);
79 kfree(n);
80 return 0;
81 }
82
83 if (irq < e->irq)
84 link = &(*link)->rb_left;
85 else 100 else
86 link = &(*link)->rb_right; 101 irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
102
103 chip_data->pnode = uv_blade_to_pnode(info->uv_blade);
104 chip_data->offset = info->uv_offset;
105 irq_domain_set_info(domain, virq, virq, &uv_irq_chip, chip_data,
106 handle_percpu_irq, NULL, info->uv_name);
107 } else {
108 kfree(chip_data);
87 } 109 }
88 110
89 /* Insert the node into the rbtree. */ 111 return ret;
90 rb_link_node(&n->list, parent, link);
91 rb_insert_color(&n->list, &uv_irq_root);
92
93 spin_unlock_irqrestore(&uv_irq_lock, irqflags);
94 return 0;
95} 112}
96 113
97/* Retrieve offset and pnode information from the rb tree for a specific irq */ 114static void uv_domain_free(struct irq_domain *domain, unsigned int virq,
98int uv_irq_2_mmr_info(int irq, unsigned long *offset, int *pnode) 115 unsigned int nr_irqs)
99{ 116{
100 struct uv_irq_2_mmr_pnode *e; 117 struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
101 struct rb_node *n; 118
102 unsigned long irqflags; 119 BUG_ON(nr_irqs != 1);
103 120 kfree(irq_data->chip_data);
104 spin_lock_irqsave(&uv_irq_lock, irqflags); 121 irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
105 n = uv_irq_root.rb_node; 122 irq_clear_status_flags(virq, IRQ_NO_BALANCING);
106 while (n) { 123 irq_domain_free_irqs_top(domain, virq, nr_irqs);
107 e = rb_entry(n, struct uv_irq_2_mmr_pnode, list);
108
109 if (e->irq == irq) {
110 *offset = e->offset;
111 *pnode = e->pnode;
112 spin_unlock_irqrestore(&uv_irq_lock, irqflags);
113 return 0;
114 }
115
116 if (irq < e->irq)
117 n = n->rb_left;
118 else
119 n = n->rb_right;
120 }
121 spin_unlock_irqrestore(&uv_irq_lock, irqflags);
122 return -1;
123} 124}
124 125
125/* 126/*
126 * Re-target the irq to the specified CPU and enable the specified MMR located 127 * Re-target the irq to the specified CPU and enable the specified MMR located
127 * on the specified blade to allow the sending of MSIs to the specified CPU. 128 * on the specified blade to allow the sending of MSIs to the specified CPU.
128 */ 129 */
129static int 130static void uv_domain_activate(struct irq_domain *domain,
130arch_enable_uv_irq(char *irq_name, unsigned int irq, int cpu, int mmr_blade, 131 struct irq_data *irq_data)
131 unsigned long mmr_offset, int limit)
132{ 132{
133 const struct cpumask *eligible_cpu = cpumask_of(cpu); 133 uv_program_mmr(irqd_cfg(irq_data), irq_data->chip_data);
134 struct irq_cfg *cfg = irq_cfg(irq);
135 unsigned long mmr_value;
136 struct uv_IO_APIC_route_entry *entry;
137 int mmr_pnode, err;
138 unsigned int dest;
139
140 BUILD_BUG_ON(sizeof(struct uv_IO_APIC_route_entry) !=
141 sizeof(unsigned long));
142
143 err = assign_irq_vector(irq, cfg, eligible_cpu);
144 if (err != 0)
145 return err;
146
147 err = apic->cpu_mask_to_apicid_and(eligible_cpu, eligible_cpu, &dest);
148 if (err != 0)
149 return err;
150
151 if (limit == UV_AFFINITY_CPU)
152 irq_set_status_flags(irq, IRQ_NO_BALANCING);
153 else
154 irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
155
156 irq_set_chip_and_handler_name(irq, &uv_irq_chip, handle_percpu_irq,
157 irq_name);
158
159 mmr_value = 0;
160 entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
161 entry->vector = cfg->vector;
162 entry->delivery_mode = apic->irq_delivery_mode;
163 entry->dest_mode = apic->irq_dest_mode;
164 entry->polarity = 0;
165 entry->trigger = 0;
166 entry->mask = 0;
167 entry->dest = dest;
168
169 mmr_pnode = uv_blade_to_pnode(mmr_blade);
170 uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
171
172 if (cfg->move_in_progress)
173 send_cleanup_vector(cfg);
174
175 return irq;
176} 134}
177 135
178/* 136/*
179 * Disable the specified MMR located on the specified blade so that MSIs are 137 * Disable the specified MMR located on the specified blade so that MSIs are
180 * longer allowed to be sent. 138 * longer allowed to be sent.
181 */ 139 */
182static void arch_disable_uv_irq(int mmr_pnode, unsigned long mmr_offset) 140static void uv_domain_deactivate(struct irq_domain *domain,
141 struct irq_data *irq_data)
183{ 142{
184 unsigned long mmr_value; 143 unsigned long mmr_value;
185 struct uv_IO_APIC_route_entry *entry; 144 struct uv_IO_APIC_route_entry *entry;
186 145
187 BUILD_BUG_ON(sizeof(struct uv_IO_APIC_route_entry) !=
188 sizeof(unsigned long));
189
190 mmr_value = 0; 146 mmr_value = 0;
191 entry = (struct uv_IO_APIC_route_entry *)&mmr_value; 147 entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
192 entry->mask = 1; 148 entry->mask = 1;
193 149 uv_program_mmr(irqd_cfg(irq_data), irq_data->chip_data);
194 uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
195} 150}
196 151
197static int 152static const struct irq_domain_ops uv_domain_ops = {
198uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask, 153 .alloc = uv_domain_alloc,
199 bool force) 154 .free = uv_domain_free,
200{ 155 .activate = uv_domain_activate,
201 struct irq_cfg *cfg = irqd_cfg(data); 156 .deactivate = uv_domain_deactivate,
202 unsigned int dest; 157};
203 unsigned long mmr_value, mmr_offset;
204 struct uv_IO_APIC_route_entry *entry;
205 int mmr_pnode;
206
207 if (apic_set_affinity(data, mask, &dest))
208 return -1;
209
210 mmr_value = 0;
211 entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
212
213 entry->vector = cfg->vector;
214 entry->delivery_mode = apic->irq_delivery_mode;
215 entry->dest_mode = apic->irq_dest_mode;
216 entry->polarity = 0;
217 entry->trigger = 0;
218 entry->mask = 0;
219 entry->dest = dest;
220
221 /* Get previously stored MMR and pnode of hub sourcing interrupts */
222 if (uv_irq_2_mmr_info(data->irq, &mmr_offset, &mmr_pnode))
223 return -1;
224
225 uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
226 158
227 if (cfg->move_in_progress) 159static struct irq_domain *uv_get_irq_domain(void)
228 send_cleanup_vector(cfg); 160{
161 static struct irq_domain *uv_domain;
162 static DEFINE_MUTEX(uv_lock);
163
164 mutex_lock(&uv_lock);
165 if (uv_domain == NULL) {
166 uv_domain = irq_domain_add_tree(NULL, &uv_domain_ops, NULL);
167 if (uv_domain)
168 uv_domain->parent = x86_vector_domain;
169 }
170 mutex_unlock(&uv_lock);
229 171
230 return IRQ_SET_MASK_OK_NOCOPY; 172 return uv_domain;
231} 173}
232 174
233/* 175/*
@@ -238,19 +180,21 @@ uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask,
238int uv_setup_irq(char *irq_name, int cpu, int mmr_blade, 180int uv_setup_irq(char *irq_name, int cpu, int mmr_blade,
239 unsigned long mmr_offset, int limit) 181 unsigned long mmr_offset, int limit)
240{ 182{
241 int ret, irq = irq_alloc_hwirq(uv_blade_to_memory_nid(mmr_blade)); 183 struct irq_alloc_info info;
184 struct irq_domain *domain = uv_get_irq_domain();
242 185
243 if (!irq) 186 if (!domain)
244 return -EBUSY; 187 return -ENOMEM;
245 188
246 ret = arch_enable_uv_irq(irq_name, irq, cpu, mmr_blade, mmr_offset, 189 init_irq_alloc_info(&info, cpumask_of(cpu));
247 limit); 190 info.type = X86_IRQ_ALLOC_TYPE_UV;
248 if (ret == irq) 191 info.uv_limit = limit;
249 uv_set_irq_2_mmr_info(irq, mmr_offset, mmr_blade); 192 info.uv_blade = mmr_blade;
250 else 193 info.uv_offset = mmr_offset;
251 irq_free_hwirq(irq); 194 info.uv_name = irq_name;
252 195
253 return ret; 196 return irq_domain_alloc_irqs(domain, 1,
197 uv_blade_to_memory_nid(mmr_blade), &info);
254} 198}
255EXPORT_SYMBOL_GPL(uv_setup_irq); 199EXPORT_SYMBOL_GPL(uv_setup_irq);
256 200
@@ -263,26 +207,6 @@ EXPORT_SYMBOL_GPL(uv_setup_irq);
263 */ 207 */
264void uv_teardown_irq(unsigned int irq) 208void uv_teardown_irq(unsigned int irq)
265{ 209{
266 struct uv_irq_2_mmr_pnode *e; 210 irq_domain_free_irqs(irq, 1);
267 struct rb_node *n;
268 unsigned long irqflags;
269
270 spin_lock_irqsave(&uv_irq_lock, irqflags);
271 n = uv_irq_root.rb_node;
272 while (n) {
273 e = rb_entry(n, struct uv_irq_2_mmr_pnode, list);
274 if (e->irq == irq) {
275 arch_disable_uv_irq(e->pnode, e->offset);
276 rb_erase(n, &uv_irq_root);
277 kfree(e);
278 break;
279 }
280 if (irq < e->irq)
281 n = n->rb_left;
282 else
283 n = n->rb_right;
284 }
285 spin_unlock_irqrestore(&uv_irq_lock, irqflags);
286 irq_free_hwirq(irq);
287} 211}
288EXPORT_SYMBOL_GPL(uv_teardown_irq); 212EXPORT_SYMBOL_GPL(uv_teardown_irq);
diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index 3c4469a7a929..e2386cb4e0c3 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -78,9 +78,9 @@ ENTRY(restore_image)
78 78
79 /* code below has been relocated to a safe page */ 79 /* code below has been relocated to a safe page */
80ENTRY(core_restore_code) 80ENTRY(core_restore_code)
81loop: 81.Lloop:
82 testq %rdx, %rdx 82 testq %rdx, %rdx
83 jz done 83 jz .Ldone
84 84
85 /* get addresses from the pbe and copy the page */ 85 /* get addresses from the pbe and copy the page */
86 movq pbe_address(%rdx), %rsi 86 movq pbe_address(%rdx), %rsi
@@ -91,8 +91,8 @@ loop:
91 91
92 /* progress to the next pbe */ 92 /* progress to the next pbe */
93 movq pbe_next(%rdx), %rdx 93 movq pbe_next(%rdx), %rdx
94 jmp loop 94 jmp .Lloop
95done: 95.Ldone:
96 /* jump to the restore_registers address from the image header */ 96 /* jump to the restore_registers address from the image header */
97 jmpq *%rax 97 jmpq *%rax
98 /* 98 /*
diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile
index acb384d24669..a8fecc226946 100644
--- a/arch/x86/um/Makefile
+++ b/arch/x86/um/Makefile
@@ -26,7 +26,7 @@ else
26 26
27obj-y += syscalls_64.o vdso/ 27obj-y += syscalls_64.o vdso/
28 28
29subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o ../lib/thunk_64.o \ 29subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o ../entry/thunk_64.o \
30 ../lib/rwsem.o 30 ../lib/rwsem.o
31 31
32endif 32endif
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 98088bf5906a..0b95c9b8283f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1181,10 +1181,11 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
1181 .read_tscp = native_read_tscp, 1181 .read_tscp = native_read_tscp,
1182 1182
1183 .iret = xen_iret, 1183 .iret = xen_iret,
1184 .irq_enable_sysexit = xen_sysexit,
1185#ifdef CONFIG_X86_64 1184#ifdef CONFIG_X86_64
1186 .usergs_sysret32 = xen_sysret32, 1185 .usergs_sysret32 = xen_sysret32,
1187 .usergs_sysret64 = xen_sysret64, 1186 .usergs_sysret64 = xen_sysret64,
1187#else
1188 .irq_enable_sysexit = xen_sysexit,
1188#endif 1189#endif
1189 1190
1190 .load_tr_desc = paravirt_nop, 1191 .load_tr_desc = paravirt_nop,
@@ -1467,6 +1468,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
1467{ 1468{
1468 struct physdev_set_iopl set_iopl; 1469 struct physdev_set_iopl set_iopl;
1469 unsigned long initrd_start = 0; 1470 unsigned long initrd_start = 0;
1471 u64 pat;
1470 int rc; 1472 int rc;
1471 1473
1472 if (!xen_start_info) 1474 if (!xen_start_info)
@@ -1574,8 +1576,8 @@ asmlinkage __visible void __init xen_start_kernel(void)
1574 * Modify the cache mode translation tables to match Xen's PAT 1576 * Modify the cache mode translation tables to match Xen's PAT
1575 * configuration. 1577 * configuration.
1576 */ 1578 */
1577 1579 rdmsrl(MSR_IA32_CR_PAT, pat);
1578 pat_init_cache_modes(); 1580 pat_init_cache_modes(pat);
1579 1581
1580 /* keep using Xen gdt for now; no urgent need to change it */ 1582 /* keep using Xen gdt for now; no urgent need to change it */
1581 1583
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index b47124d4cd67..8b7f18e200aa 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -67,6 +67,7 @@
67#include <linux/seq_file.h> 67#include <linux/seq_file.h>
68#include <linux/bootmem.h> 68#include <linux/bootmem.h>
69#include <linux/slab.h> 69#include <linux/slab.h>
70#include <linux/vmalloc.h>
70 71
71#include <asm/cache.h> 72#include <asm/cache.h>
72#include <asm/setup.h> 73#include <asm/setup.h>
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 985fc3ee0973..f22667abf7b9 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -15,6 +15,8 @@
15#include <asm/percpu.h> 15#include <asm/percpu.h>
16#include <asm/processor-flags.h> 16#include <asm/processor-flags.h>
17#include <asm/segment.h> 17#include <asm/segment.h>
18#include <asm/asm-offsets.h>
19#include <asm/thread_info.h>
18 20
19#include <xen/interface/xen.h> 21#include <xen/interface/xen.h>
20 22
@@ -47,29 +49,13 @@ ENTRY(xen_iret)
47ENDPATCH(xen_iret) 49ENDPATCH(xen_iret)
48RELOC(xen_iret, 1b+1) 50RELOC(xen_iret, 1b+1)
49 51
50/*
51 * sysexit is not used for 64-bit processes, so it's only ever used to
52 * return to 32-bit compat userspace.
53 */
54ENTRY(xen_sysexit)
55 pushq $__USER32_DS
56 pushq %rcx
57 pushq $X86_EFLAGS_IF
58 pushq $__USER32_CS
59 pushq %rdx
60
61 pushq $0
621: jmp hypercall_iret
63ENDPATCH(xen_sysexit)
64RELOC(xen_sysexit, 1b+1)
65
66ENTRY(xen_sysret64) 52ENTRY(xen_sysret64)
67 /* 53 /*
68 * We're already on the usermode stack at this point, but 54 * We're already on the usermode stack at this point, but
69 * still with the kernel gs, so we can easily switch back 55 * still with the kernel gs, so we can easily switch back
70 */ 56 */
71 movq %rsp, PER_CPU_VAR(rsp_scratch) 57 movq %rsp, PER_CPU_VAR(rsp_scratch)
72 movq PER_CPU_VAR(kernel_stack), %rsp 58 movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
73 59
74 pushq $__USER_DS 60 pushq $__USER_DS
75 pushq PER_CPU_VAR(rsp_scratch) 61 pushq PER_CPU_VAR(rsp_scratch)
@@ -88,7 +74,7 @@ ENTRY(xen_sysret32)
88 * still with the kernel gs, so we can easily switch back 74 * still with the kernel gs, so we can easily switch back
89 */ 75 */
90 movq %rsp, PER_CPU_VAR(rsp_scratch) 76 movq %rsp, PER_CPU_VAR(rsp_scratch)
91 movq PER_CPU_VAR(kernel_stack), %rsp 77 movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
92 78
93 pushq $__USER32_DS 79 pushq $__USER32_DS
94 pushq PER_CPU_VAR(rsp_scratch) 80 pushq PER_CPU_VAR(rsp_scratch)
@@ -128,7 +114,7 @@ RELOC(xen_sysret32, 1b+1)
128/* Normal 64-bit system call target */ 114/* Normal 64-bit system call target */
129ENTRY(xen_syscall_target) 115ENTRY(xen_syscall_target)
130 undo_xen_syscall 116 undo_xen_syscall
131 jmp system_call_after_swapgs 117 jmp entry_SYSCALL_64_after_swapgs
132ENDPROC(xen_syscall_target) 118ENDPROC(xen_syscall_target)
133 119
134#ifdef CONFIG_IA32_EMULATION 120#ifdef CONFIG_IA32_EMULATION
@@ -136,13 +122,13 @@ ENDPROC(xen_syscall_target)
136/* 32-bit compat syscall target */ 122/* 32-bit compat syscall target */
137ENTRY(xen_syscall32_target) 123ENTRY(xen_syscall32_target)
138 undo_xen_syscall 124 undo_xen_syscall
139 jmp ia32_cstar_target 125 jmp entry_SYSCALL_compat
140ENDPROC(xen_syscall32_target) 126ENDPROC(xen_syscall32_target)
141 127
142/* 32-bit compat sysenter target */ 128/* 32-bit compat sysenter target */
143ENTRY(xen_sysenter_target) 129ENTRY(xen_sysenter_target)
144 undo_xen_syscall 130 undo_xen_syscall
145 jmp ia32_sysenter_target 131 jmp entry_SYSENTER_compat
146ENDPROC(xen_sysenter_target) 132ENDPROC(xen_sysenter_target)
147 133
148#else /* !CONFIG_IA32_EMULATION */ 134#else /* !CONFIG_IA32_EMULATION */
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9e195c683549..c20fe29e65f4 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -134,7 +134,9 @@ DECL_ASM(void, xen_restore_fl_direct, unsigned long);
134 134
135/* These are not functions, and cannot be called normally */ 135/* These are not functions, and cannot be called normally */
136__visible void xen_iret(void); 136__visible void xen_iret(void);
137#ifdef CONFIG_X86_32
137__visible void xen_sysexit(void); 138__visible void xen_sysexit(void);
139#endif
138__visible void xen_sysret32(void); 140__visible void xen_sysret32(void);
139__visible void xen_sysret64(void); 141__visible void xen_sysret64(void);
140__visible void xen_adjust_exception_frame(void); 142__visible void xen_adjust_exception_frame(void);
diff --git a/arch/xtensa/include/asm/io.h b/arch/xtensa/include/asm/io.h
index fe1600a09438..c39bb6e61911 100644
--- a/arch/xtensa/include/asm/io.h
+++ b/arch/xtensa/include/asm/io.h
@@ -59,6 +59,7 @@ static inline void __iomem *ioremap_cache(unsigned long offset,
59} 59}
60 60
61#define ioremap_wc ioremap_nocache 61#define ioremap_wc ioremap_nocache
62#define ioremap_wt ioremap_nocache
62 63
63static inline void __iomem *ioremap(unsigned long offset, unsigned long size) 64static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
64{ 65{