aboutsummaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAge
...
| * | | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2008-11-07
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, xen: fix use of pgd_page now that it really does return a page
| | * | | | | x86, xen: fix use of pgd_page now that it really does return a pageJeremy Fitzhardinge2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix 32-bit Xen guest boot crash On 32-bit PAE, pud_page, for no good reason, didn't really return a struct page *. Since Jan Beulich's fix "i386/PAE: fix pud_page()", pud_page does return a struct page *. Because PAE has 3 pagetable levels, the pud level is folded into the pgd level, so pgd_page() is the same as pud_page(), and now returns a struct page *. Update the xen/mmu.c code which uses pgd_page() accordingly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | xen: make sure stray alias mappings are gone before pinningJeremy Fitzhardinge2008-11-07
| | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xen requires that all mappings of pagetable pages are read-only, so that they can't be updated illegally. As a result, if a page is being turned into a pagetable page, we need to make sure all its mappings are RO. If the page had been used for ioremap or vmalloc, it may still have left over mappings as a result of not having been lazily unmapped. This change makes sure we explicitly mop them all up before pinning the page. Unlike aliases created by kmap, the there can be vmalloc aliases even for non-high pages, so we must do the flush unconditionally. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2008-11-06
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "x86: default to reboot via ACPI" x86: align DirectMap in /proc/meminfo AMD IOMMU: fix lazy IO/TLB flushing in unmap path x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR x86: remove VISWS and PARAVIRT around NR_IRQS puzzle x86: mention ACPI in top-level Kconfig menu x86: size NR_IRQS on 32-bit systems the same way as 64-bit x86: don't allow nr_irqs > NR_IRQS x86/docs: remove noirqbalance param docs x86: don't use tsc_khz to calculate lpj if notsc is passed x86, voyager: fix smp_intr_init() compile breakage AMD IOMMU: fix detection of NP capable IOMMUs
| | * | | | Revert "x86: default to reboot via ACPI"Eduardo Habkost2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c7ffa6c26277b403920e2255d10df849bd613380. the assumptio of this change was that this would not break any existing machine. Andrey Borzenkov reported troubles with the ACPI reboot method: the system would hang on reboot, necessiating a power cycle. Probably more systems are affected as well. Also, there are patches queued up for v2.6.29 to disable virtualization on emergency_restart() - which was the original motivation of this change. Reported-by: Andrey Borzenkov <arvidjaar@mail.ru> Bisected-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | x86: align DirectMap in /proc/meminfoHugh Dickins2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: right-align /proc/meminfo consistent with other fields When the split-LRU patches added Inactive(anon) and Inactive(file) lines to /proc/meminfo, all counts were moved two columns rightwards to fit in. Now move x86's DirectMap lines two columns rightwards to line up. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | Merge branch 'iommu-fixes-2.6.28' of ↵Ingo Molnar2008-11-06
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
| | | * | | | AMD IOMMU: fix lazy IO/TLB flushing in unmap pathJoerg Roedel2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lazy flushing needs to take care of the unmap path too which is not yet implemented and leads to stale IO/TLB entries. This is fixed by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | | | AMD IOMMU: fix detection of NP capable IOMMUsJoerg Roedel2008-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the code to use IOMMU_CAP_NPCACHE as a shift and not as a mask. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | * | | | | x86: add smp_mb() before sending INVALIDATE_TLB_VECTORSuresh Siddha2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix rare x2apic hang On x86, x2apic mode accesses for sending IPI's don't have serializing semantics. If the IPI receivner refers(in lock-free fashion) to some memory setup by the sender, the need for smp_mb() before sending the IPI becomes critical in x2apic mode. Add the smp_mb() in native_flush_tlb_others() before sending the IPI. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | x86: remove VISWS and PARAVIRT around NR_IRQS puzzleYinghai Lu2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix warning message when PARAVIRT is set in config Remove stale #ifdef components from our IRQ sizing logic. x86/Voyager is the only holdout. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | x86: mention ACPI in top-level Kconfig menuBjorn Helgaas2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: clarify menuconfig text Mention ACPI in the top-level menu to give a clue as to where it lives. This matches what ia64 does. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | x86: size NR_IRQS on 32-bit systems the same way as 64-bitYinghai Lu2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: make NR_IRQS big enough for system with lots of apic/pins If lots of IO_APIC's are there (or can be there), size the same way as 64-bit, depending on MAX_IO_APICS and NR_CPUS. This fixes the boot problem reported by Ben Hutchings on a 32-bit server with 5 IO-APICs and 240 IO-APIC pins. Signed-off-by: Yinghai <yinghai@kernel.org> Tested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | x86: don't allow nr_irqs > NR_IRQSBen Hutchings2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can return a value larger than NR_IRQS. This will lead to probe_irq_on() overrunning the irq_desc array. I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a Supermicro dual Xeon system. NR_IRQS is 224 but probe_nr_irqs() detects 5 IOAPICs and returns 240. Here are the log messages: Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) Tue Nov 4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24]) Tue Nov 4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48]) Tue Nov 4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72]) Tue Nov 4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96]) Tue Nov 4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119 Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Tue Nov 4 16:53:47 2008 Enabling APIC mode: Flat. Using 5 I/O APICs Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | x86: don't use tsc_khz to calculate lpj if notsc is passedAlok Kataria2008-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix udelay when "notsc" boot parameter is passed With notsc passed on commandline, tsc may not be used for udelays, make sure that we do not use tsc_khz to calculate the lpj value in such cases. Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | x86, voyager: fix smp_intr_init() compile breakageJames Bottomley2008-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix x86/Voyager build Looks like this became static on the rest of x86. Fix it up by adding an external definition to mach-voyager/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2008-11-06
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/home/rmk/linux-2.6-arm: [ARM] xsc3: fix xsc3_l2_inv_range [ARM] mm: fix page table initialization [ARM] fix naming of MODULE_START / MODULE_END ARM: OMAP: Fix define for twl4030 irqs ARM: OMAP: Fix get_irqnr_and_base to clear spurious interrupt bits ARM: OMAP: Fix debugfs_create_*'s error checking method for arm/plat-omap ARM: OMAP: Fix compiler warnings in gpmc.c [ARM] fix VFP+softfloat binaries
| | * \ \ \ \ \ Merge branch 'omap-fixes' of ↵Russell King2008-11-06
| | |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
| | | * | | | | | ARM: OMAP: Fix define for twl4030 irqsTony Lindgren2008-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise twl4030 gpios won't work. Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | | ARM: OMAP: Fix get_irqnr_and_base to clear spurious interrupt bitsTony Lindgren2008-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On omap24xx, INTCPS_SIR_IRQ_OFFSET bits [6:0] contains the current active interrupt number. However, on 34xx INTCPS_SIR_IRQ_OFFSET bits [31:7] also contains the SPURIOUSIRQFLAG, which gets set if the interrupt sorting information is invalid. If the SPURIOUSIRQFLAG bits are not ignored, the interrupt code will occasionally produce a bunch of confusing errors: irq -33, desc: c02ddcc8, depth: 0, count: 0, unhandled: 0 ->handle_irq(): c006f23c, handle_bad_irq+0x0/0x22c ->chip(): 00000000, 0x0 ->action(): 00000000 Fix this by masking out only the ACTIVEIRQ bits. Also fix a confusing comment. Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | | ARM: OMAP: Fix debugfs_create_*'s error checking method for arm/plat-omapZhaolei2008-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | debugfs_create_*() returns NULL if an error occurs, returns -ENODEV when debugfs is not enabled in the kernel. Comparing to PATCH v1, because clk_debugfs_init is included in "#if defined CONFIG_DEBUG_FS", we only need to check NULL return. Thanks Li Zefan <lizf@cn.fujitsu.com> debugfs_create_u8() and other function's return value's checking method are also fixed in this patch. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | | * | | | | | ARM: OMAP: Fix compiler warnings in gpmc.cSanjeev Premi2008-11-04
| | | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix these compiler warnings: gpmc.c: In function 'gpmc_init': gpmc.c:432: warning: 'return' with a value, in function returning void gpmc.c:439: warning: 'return' with a value, in function returning void Signed-off-by: Sanjeev Premi <premi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | | | Merge branch 'fixes' of ↵Russell King2008-11-06
| | |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/djbw/xscaleiop
| | | * | | | | | [ARM] xsc3: fix xsc3_l2_inv_rangeDan Williams2008-11-06
| | | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When 'start' and 'end' are less than a cacheline apart and 'start' is unaligned we are done after cleaning and invalidating the first cacheline. So check for (start < end) which will not walk off into invalid address ranges when (start > end). This issue was caught by drivers/dma/dmatest. 2.6.27 is susceptible. Cc: <stable@kernel.org> Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Cc: Lothar WaÃ<9f>mann <LW@KARO-electronics.de> Cc: Lennert Buytenhek <buytenh@marvell.com> Cc: Eric Miao <eric.miao@marvell.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | | | | [ARM] mm: fix page table initializationRussell King2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a result of the ptebits changes, we ended up marking device mappings as normal memory on ARMv7 CPUs, resulting in undesirable behaviour with serial ports and the like. While reviewing the section mapping table entries, other errors in the memory type settings for devices were detected and confirmed to prevent Xscale3 platforms booting. Tested on: OMAP34xx (ARMv7), OMAP24xx (ARMv6), OMAP16xx (ARM926T, ARMv5), PXA311 (Xscale3), PXA272 (Xscale), PXA255 (Xscale), IXP42x (Xscale), S3C2410 (ARM920T, ARMv4T), ARM720T (ARMv4T) StrongARM-110 (ARMv4) Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Tested-by: Mike Rapoport <mike@compulab.co.il> Tested-by: Ben Dooks <ben-linux@fluff.org> Tested-by: Anders Grafström <grfstrm@users.sourceforge.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | * | | | | | [ARM] fix naming of MODULE_START / MODULE_ENDRussell King2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of 73bdf0a60e607f4b8ecc5aec597105976565a84f, the kernel needs to know where modules are located in the virtual address space. On ARM, we located this region between MODULE_START and MODULE_END. Unfortunately, everyone else calls it MODULES_VADDR and MODULES_END. Update ARM to use the same naming, so is_vmalloc_or_module_addr() can work properly. Also update the comment on mm/vmalloc.c to reflect that ARM also places modules in a separate region from the vmalloc space. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| | * | | | | | [ARM] fix VFP+softfloat binariesRussell King2008-11-04
| | | |_|/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2.6.28-rc tightened up the ELF architecture checks on ARM. For non-EABI it only allows VFP if the hardware supports it. However, the kernel fails to also inspect the soft-float flag, so it incorrectly rejects binaries using soft-VFP. The fix is simple: also check that EF_ARM_SOFT_FLOAT isn't set before rejecting VFP binaries on non-VFP hardware. Acked-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | | Merge branch 'merge' of ↵Linus Torvalds2008-11-06
| |\ \ \ \ \ \ | | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: powerpc: Fix "unused variable" warning in pci_dlpar.c powerpc/cell: Fix compile error in ras.c powerpc/ps3: Fix compile error in ps3-lpm.c
| * | | | | | sched: re-tune balancingIngo Molnar2008-11-05
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: improve wakeup affinity on NUMA systems, tweak SMP systems Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain balancing defaults on NUMA and SMP systems. Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason why we would not want to have wakeup affinity across nodes as well. (we already do this in the standard NUMA template.) lat_ctx on a NUMA box is particularly happy about this change: before: | phoenix:~/l> ./lat_ctx -s 0 2 | "size=0k ovr=2.60 | 2 5.70 after: | phoenix:~/l> ./lat_ctx -s 0 2 | "size=0k ovr=2.65 | 2 2.07 a 2.75x speedup. pipe-test is similarly happy about it too: | phoenix:~/sched-tests> ./pipe-test | 18.26 usecs/loop. | 14.70 usecs/loop. | 14.38 usecs/loop. | 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1 | 8.63 usecs/loop. | 8.59 usecs/loop. | 9.03 usecs/loop. | 8.94 usecs/loop. | 8.96 usecs/loop. | 8.63 usecs/loop. Also: - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings) - enable SD_WAKE_BALANCE on SMP domains Sysbench+postgresql improves all around the board, quite significantly: .28-rc3-11474e2c .28-rc3-11474e2c-tune ------------------------------------------------- 1: 571 688 +17.08% 2: 1236 1206 -2.55% 4: 2381 2642 +9.89% 8: 4958 5164 +3.99% 16: 9580 9574 -0.07% 32: 7128 8118 +12.20% 64: 7342 8266 +11.18% 128: 7342 8064 +8.95% 256: 7519 7884 +4.62% 512: 7350 7731 +4.93% ------------------------------------------------- SUM: 55412 59341 +6.62% So it's a win both for the runup portion, the peak area and the tail. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | powerpc: Use the new byteorder headersHarvey Harrison2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/boot: Allocate more memory for dtbSebastian Siewior2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Gibson suggested that since we are now unconditionally copying the dtb into a malloc()ed buffer, it would be sensible to add a little padding to the buffer at that point, so that further device tree manipulations won't need to reallocate it. This implements that suggestion. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Hugetlb pgtable cache access cleanupJon Tollefson2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrew Morton suggested that using a macro that makes an array reference look like a function call makes it harder to understand the code. This therefore removes the huge_pgtable_cache(psize) macro and replaces its uses with pgtable_cache[HUGE_PGTABLE_INDEX(psize)]. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/ps3: Fix memory leak in device initMasakazu Mokuno2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Free dynamically allocated device data structures when device registration fails. This fixes memory leakage when the registration fails. Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp> Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Eliminate unused do_gtod variablePaul Mackerras2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we started using the generic timekeeping code, we haven't had a powerpc-specific version of do_gettimeofday, and hence there is now nothing that reads the do_gtod variable in arch/powerpc/kernel/time.c. This therefore removes it and the code that sets it. Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Improve resolution of VDSO clock_gettimePaul Mackerras2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the clock_gettime implementation in the VDSO produces a result with microsecond resolution for the cases that are handled without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC. The nanoseconds field of the result is obtained by computing a microseconds value and multiplying by 1000. This changes the code in the VDSO to do the computation for clock_gettime with nanosecond resolution. That means that the resolution of the result will ultimately depend on the timebase frequency. Because the timestamp in the VDSO datapage (stamp_xsec, the real time corresponding to the timebase count in tb_orig_stamp) is in units of 2^-20 seconds, it doesn't have sufficient resolution for computing a result with nanosecond resolution. Therefore this adds a copy of xtime to the VDSO datapage and updates it in update_gtod() along with the other time-related fields. Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Remove map_/unmap_single() from dma_mapping_opsMark Nelson2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that all of the remaining dma_mapping_ops have had their map_/unmap_single functions updated to become map/unmap_page functions, there is no need to have the map_/unmap_single function pointers in the dma_mapping_ops. So, this removes them and also removes the code that does the checking for which set of functions to use. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Acked-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pci: Cosmetic cleanups of pci-common.cBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does a few cosmetic cleanups, moving a couple of things around but without actually changing what the code does. (There is a minor change in ordering of operations in pcibios_setup_bus_devices but it should have no impact). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pci: Fix various pseries PCI hotplug issuesBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pseries PCI hotplug code has a number of issues, ranging from incorrect resource setup to crashes, depending on what is added, when, whether it contains a bridge, etc etc.... This fixes a whole bunch of these, while actually simplifying the code a bit, using more generic code in the process and factoring out common code between adding of a PHB, a slot or a device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pci: Make pcibios_allocate_bus_resources more robustBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To properly fix PCI hotplug, it's useful to be able to make the fixup passes on all devices whether they were just hot plugged or already there. However, pcibios_allocate_bus_resources() wouldn't cope well with being called twice for a given bus. This makes it ignore resources that have already been allocated, along with adding a bit of debug output. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/eeh: Make EEH device add/remove more robustBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To properly fix PCI hotplug, it's useful to be able to make the fixup passes on all devices whether they were just hot plugged or already there. The EEH code however used to not be very friendly with calling eeh_add_device_late() multiple time, and not very rebust in the way it generally tests whether a device is in the expected state vs. the EEH code. This improves it, along with cleaning up a couple of debug printk's. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pci: Split pcibios_fixup_bus() into bus setup and device setupBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, our PCI code uses the pcibios_fixup_bus() callback, which is called by the generic code when probing PCI buses, for two different things. One is to set up things related to the bus itself, such as reading bridge resources for P2P bridges, fixing them up, or setting up the iommu's associated with bridges on some platforms. The other is some setup for each individual device under that bridge, mostly setting up DMA mappings and interrupts. The problem is that this approach doesn't work well with PCI hotplug when an existing bus is re-probed for new children. We fix this problem by splitting pcibios_fixup_bus into two routines: pcibios_setup_bus_self() is now called to setup the bus itself pcibios_setup_bus_devices() is now called to setup devices pcibios_fixup_bus() is then modified to call these two after reading the bridge bases, and the OF based PCI probe is modified to avoid calling into the first one when rescanning an existing bridge. [paulus@samba.org - fixed eeh.h for 32-bit compile now that pci-common.c is including it unconditionally.] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pci: Remove pcibios_do_bus_setup()Benjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function pcibios_do_bus_setup() was used by pcibios_fixup_bus() to perform setup that is different between the 32-bit and 64-bit code. This difference no longer exists, thus the function is removed and the setup now done directly from pci-common.c. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pci: Use common PHB resource hookupBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 32-bit and 64-bit powerpc PCI code used to set up the resource pointers of the root bus of a given PHB in completely different places. This unifies this in large part, by making 32-bit use a routine very similar to what 64-bit does when initially scanning the PCI busses. The actual setup of the PHB resources itself is then moved to a common function in pci-common.c. This should cause no functional change on 64-bit. On 32-bit, the effect is that the PHB resources are going to be setup a bit earlier, instead of being setup from pcibios_fixup_bus(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pci: Cleanup debug printk'sBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes the various DBG() macro from the powerpc PCI code and makes it use the standard pr_debug instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STDMark Nelson2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update memcpy() to add two new feature sections: one for aligning the destination before copying and one for copying using aligned load and store doubles. These new feature sections will only affect Power6 and Cell because the CPU feature bit was only added to these two processors. Power6 gets its best performance in memcpy() when aligning neither the source nor the destination, while Cell gets its best performance when just the destination is aligned. But in order to save on CPU feature bits we can use the previously added CPU_FTR_CP_USE_DCBTZ feature bit to differentiate between Power6 and Cell (because CPU_FTR_CP_USE_DCBTZ was added to Cell but not Power6). The first feature section acts to nop out the branch that takes us to the code that aligns us to an eight byte boundary for the destination. We only want to nop out this branch on Power6. So the ALT_FTR_SECTION_END() for this feature section creates a test mask of the two feature bits ORed together and provides an expected result of just CPU_FTR_UNALIGNED_LD_STD, thus we nop out the branch if we're on a CPU that has CPU_FTR_UNALIGNED_LD_STD set and CPU_FTR_CP_USE_DCBTZ unset. For the second feature section added, if we're on a CPU that has the CPU_FTR_UNALIGNED_LD_STD bit set then we don't want to do the copy with aligned loads and stores (and the appropriate shifting left and right instructions), so we want to nop out the branch to .Lsrc_unaligned. The andi. used for this branch is moved to just above the branch because this allows us to nop out both instructions with just one feature section which gives us better performance and doesn't hurt readability which two separate feature sections did. Moving the andi. to just above the branch doesn't have any noticeable negative effect on the remaining 64bit processors (the ones that didn't have this feature bit added). On Cell this simple modification results in an improvement to measured memcpy() bandwidth of up to 50% in the hot cache case and up to 15% in the cold cache case. On Power6 we get memory bandwidth results that are up to three times faster in the hot cache case and up to 50% faster in the cold cache case. Commit 2a9294369bd020db89bfdf78b84c3615b39a5c84 ("powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ") was where CPU_FTR_CP_USE_DCBTZ was added. To say that Cell gets its best performance in memcpy() with just the destination aligned is true but only for the reason that the indirect shift and rotate instructions, sld and srd, are microcoded on Cell. This means that either the destination or the source can be aligned, but not both, and seeing as we get better performance with the destination aligned we choose this option. While we're at it make a one line change from cmpldi r1,... to cmpldi cr1,... for consistency. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Add new CPU feature: CPU_FTR_UNALIGNED_LD_STDMark Nelson2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new CPU feature bit, CPU_FTR_UNALIGNED_LD_STD, to be added to the 64bit powerpc chips that can do unaligned load double and store double without any performance hit. This is added to Power6 and Cell and will be used in the next commit to disable the code that gets the destination address aligned on those CPUs where doing that doesn't improve performance. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Update page-in counter for CMMBrian King2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new field has been added to the VPA as a method for the client OS to communicate to firmware the number of page-ins it is performing when running collaborative memory overcommit. The hypervisor will use this information to better determine if a partition is experiencing memory pressure and needs more memory allocated to it. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc/pseries: Fix getting the server number sizeSebastien Dugue2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'ibm,interrupt-server#-size' properties are not in the cpu nodes, which is where we currently look for them, but rather live under the interrupt source controller nodes (which have "ibm,ppc-xics" in their compatible property). This moves the code that looks for the ibm,interrupt-server#-size properties from xics_update_irq_servers() into xics_init_IRQ(). Also this adds a check for mismatched sizes across the interrupt source controller nodes. Not sure this is necessary as in this case the firmware might be seriously busted. This property only appears on POWER6 boxes and is only used in the set-indicator(gqirm) call, and apparently firmware currently ignores the value we pass. Nevertheless we need to fix it in case future firmware versions use it. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Remove device_type = "rtc" properties in .dts filesAnton Vorontsov2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want to encourage the device_type usage. It isn't used in the code, so we can simply remove it from the dts files. Suggested-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | powerpc: Silence software timebase syncBenjamin Herrenschmidt2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When no hardware method is provided to sync the timebase registers across the machine, and the platform doesn't sync them for us, then we use a generic software implementation. Currently, the code for that has many printks, and they don't have log levels. Most of the printks are only useful for debugging the code, and since we haven't had any problems with it for years, this turns them into pr_debug. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>