aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms/pseries
Commit message (Collapse)AuthorAge
* const: constify remaining file_operationsAlexey Dobriyan2009-10-01
| | | | | | | | [akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powerpc: Change archdata dma_data to a unionBecky Bruce2009-09-24
| | | | | | | | | | Sometimes this is used to hold a simple offset, and sometimes it is used to hold a pointer. This patch changes it to a union containing void * and dma_addr_t. get/set accessors are also provided, because it was getting a bit ugly to get to the actual data. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* cpumask: Use accessors for cpu_*_mask: powerpcRusty Russell2009-09-23
| | | | | | | | Use the accessors rather than frobbing bits directly (the new versions are const). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
* seq_file: constify seq_operationsJames Morris2009-09-23
| | | | | | | | | | | | | | Make all seq_operations structs const, to help mitigate against revectoring user-triggerable function pointers. This is derived from the grsecurity patch, although generated from scratch because it's simpler than extracting the changes from there. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'linux-next' of ↵Linus Torvalds2009-09-16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits) PCI hotplug: clean up acpi_run_hpp() PCI hotplug: acpiphp: use generic pci_configure_slot() PCI hotplug: shpchp: use generic pci_configure_slot() PCI hotplug: pciehp: use generic pci_configure_slot() PCI hotplug: add pci_configure_slot() PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation PCI: Clear saved_state after the state has been restored PCI PM: Return error codes from pci_pm_resume() PCI: use dev_printk in quirk messages PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset() PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle PCI Hotplug: acpiphp: find bridges the easy way PCI: pcie portdrv: remove unused variable PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support ACPI PM: Replace wakeup.prepared with reference counter PCI PM: Introduce device flag wakeup_prepared PCI / ACPI PM: Rework some debug messages PCI PM: Simplify PCI wake-up code ... Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree scanning having been moved and merged for the 32- and 64-bit cases. The 'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
| * PCI/powerpc: support PCIe fundamental resetMike Mason2009-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, the EEH framework on powerpc does what's known as a "hot reset" during recovery of a PCI Express device. We've found a case where the device needs a "fundamental reset" to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch makes changes to EEH to utilize the new bit field. Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Signed-off-by: Richard Lary <rlary@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | powerpc: Fix bug where perf_counters breaks oprofilePaul Mackerras2009-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is a bug where if you use oprofile on a pSeries machine, then use perf_counters, then use oprofile again, oprofile will not work correctly; it will lose the PMU configuration the next time the hypervisor does a partition context switch, and thereafter won't count anything. Maynard Johnson identified the sequence causing the problem: - oprofile setup calls ppc_enable_pmcs(), which calls pseries_lpar_enable_pmcs, which tells the hypervisor that we want to use the PMU, and sets the "PMU in use" flag in the lppaca. This flag tells the hypervisor whether it needs to save and restore the PMU config. - The perf_counter code sets and clears the "PMU in use" flag directly as it context-switches the PMU between tasks, and leaves it clear when it finishes. - oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs, which does nothing because it has already been called. In particular it doesn't set the "PMU in use" flag. This fixes the problem by arranging for ppc_enable_pmcs to always set the "PMU in use" flag. It makes the perf_counter code call ppc_enable_pmcs also rather than calling the lower-level function directly, and removes the setting of the "PMU in use" flag from pseries_lpar_enable_pmcs, since that is now done in its caller. This also removes the declaration of pasemi_enable_pmcs because it isn't defined anywhere. Reported-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: <stable@kernel.org) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc/pseries: Fix to handle slb resize across migrationBrian King2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()Grant Likely2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | The two versions are doing almost exactly the same thing. No need to maintain them as separate files. This patch also has the side effect of making the PCI device tree scanning code available to 32 bit powerpc machines, but no board ports actually make use of this feature at this point. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc: Move definitions of secondary CPU spinloop to header fileBenjamin Herrenschmidt2009-08-19
|/ | | | | | | Those definitions are currently declared extern in the .c file where they are used, move them to a header file instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/pseries: Use pr_devel() in xics.cMichael Ellerman2009-07-07
| | | | | | | | | | | | | | | | | | pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 7720 5488 296 13504 34c0 platforms/pseries/xics.o size after: text data bss dec hex filename 7535 5456 296 13287 33e7 platforms/pseries/xics.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/pseries: Use pr_devel() in pseries LPAR HPTE routinesMichael Ellerman2009-07-07
| | | | | | | | | | | | | | | | | | | | | | pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. In particular, pSeries_lpar_hpte_insert() goes from 185 instructions to 77 instructions as a result of this patch. Luckily that code isn't called very often ... With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 7284 1552 296 9132 23ac platforms/pseries/lpar.o size after: text data bss dec hex filename 5806 1096 296 7198 1c1e platforms/pseries/lpar.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Use one common impl. of RTAS timebase sync and use raw spinlockBenjamin Herrenschmidt2009-06-26
| | | | | | | | | | | Several platforms use their own copy of what is essentially the same code, using RTAS to synchronize the timebases when bringing up new CPUs. This moves it all into a single common implementation and additionally turns the spinlock into a raw spinlock since the former can rely on the timebase not being frozen when spinlock debugging is enabled, and finally masks interrupts while the timebase is disabled. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge branch 'linux-next' of ↵Linus Torvalds2009-06-22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (74 commits) PCI: make msi_free_irqs() to use msix_mask_irq() instead of open coded write PCI: Fix the NIU MSI-X problem in a better way PCI ASPM: remove get_root_port_link PCI ASPM: cleanup pcie_aspm_sanity_check PCI ASPM: remove has_switch field PCI ASPM: cleanup calc_Lx_latency PCI ASPM: cleanup pcie_aspm_get_cap_device PCI ASPM: cleanup clkpm checks PCI ASPM: cleanup __pcie_aspm_check_state_one PCI ASPM: cleanup initialization PCI ASPM: cleanup change input argument of aspm functions PCI ASPM: cleanup misc in struct pcie_link_state PCI ASPM: cleanup clkpm state in struct pcie_link_state PCI ASPM: cleanup latency field in struct pcie_link_state PCI ASPM: cleanup aspm state field in struct pcie_link_state PCI ASPM: fix typo in struct pcie_link_state PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS PCI: remove redundant __msi_set_enable() PCI PM: consistently use type bool for wake enable variable x86/ACPI: Correct maximum allowed _CRS returned resources and warn if exceeded ...
| * PCI AER: support Multiple Error Received and no error source idZhang, Yanmin2009-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on PCI Express AER specs, a root port might receive multiple TLP errors while it could only save a correctable error source id and an uncorrectable error source id at the same time. In addition, some root port hardware might be unable to provide a correct source id, i.e., the source id, or the bus id part of the source id provided by root port might be equal to 0. The patchset implements the support in kernel by searching the device tree under the root port. Patch 1 changes parameter cb of function pci_walk_bus to return a value. When cb return non-zero, pci_walk_bus stops more searching on the device tree. Reviewed-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | Merge commit 'origin/master' into nextBenjamin Herrenschmidt2009-06-12
|\ \ | | | | | | | | | | | | Manual merge of: arch/powerpc/kernel/asm-offsets.c
| * | irq: change ->set_affinity() to return statusYinghai Lu2009-04-28
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | according to Ingo, change set_affinity() in irq_chip should return int, because that way we can handle failure cases in a much cleaner way, in the genirq layer. v2: fix two typos [ Impact: extend API ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-arch@vger.kernel.org LKML-Reference: <49F654E9.4070809@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | powerpc/pseries: Fix warnings when printing resource_size_tStephen Rothwell2009-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | resource_size_t is 64 bits on pseries Gets rid of these warnings: arch/powerpc/platforms/pseries/iommu.c: In function 'pci_dma_bus_setup_pSeries': arch/powerpc/platforms/pseries/iommu.c:391: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'resource_size_t' arch/powerpc/platforms/pseries/iommu.c:417: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'resource_size_t' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc: Convert RTAS event scan from kernel thread to workqueueAnton Blanchard2009-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RTAS event scan has to run across all cpus. Right now we use a kernel thread and set_cpus_allowed but in doing so we wake up the previous cpu unnecessarily. Some ftrace output shows this: previous cpu (2): [002] 7.022331: sched_switch: task swapper:0 [140] ==> rtasd:194 [120] [002] 7.022338: sched_switch: task rtasd:194 [120] ==> migration/2:9 [0] [002] 7.022344: sched_switch: task migration/2:9 [0] ==> swapper:0 [140] next cpu (3): [003] 7.022345: sched_switch: task swapper:0 [140] ==> rtasd:194 [120] [003] 7.022371: sched_switch: task rtasd:194 [120] ==> swapper:0 [140] We can use schedule_delayed_work_on and avoid the unnecessary wakeup. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc/pci: Move pseries code into pseries platform specific areaKumar Gala2009-05-21
| | | | | | | | | | | | | | | | | | There doesn't appear to be any specific reason that we need to setup the pseries specific notifier in generic arch pci code. Move it into pseries land. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc/pseries: CMO unused page hintingRobert Jennings2009-05-21
|/ | | | | | | | | | | | | | | Adds support for the "unused" page hint which can be used in shared memory partitions to flag pages not in use, which will then be stolen before active pages by the hypervisor when memory needs to be moved to LPARs in need of additional memory. Failure to mark pages as 'unused' makes the LPAR slower to give up unused memory to other partitions. This adds the kernel parameter 'cmo_free_hint' to disable this functionality. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: pseries/dtl.c should include asm/firmware.hSachin Sant2009-04-15
| | | | | | | | | | | | | | A randconfig build on powerpc failed with: dtl.c: In function 'dtl_init': dtl.c:238: error: implicit declaration of function 'firmware_has_feature' dtl.c:238: error: 'FW_FEATURE_SPLPAR' undeclared (first use in this function) - We need firmware.h for these definitions. Signed-off-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()Mike Mason2009-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While adding native EEH support to Emulex and Qlogic drivers, it was discovered that dev->error_state was set to pci_io_channel_normal too late in the recovery process. These drivers rely on error_state to determine if they can access the device in their slot_reset callback, thus error_state needs to be set to pci_io_channel_normal in eeh_report_reset(). Below is a detailed explanation (courtesy of Richard Lary) as to why this is necessary. Background: PCI MMIO or DMA accesses to a frozen slot generate additional EEH errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the adapter will be shutdown. To avoid triggering excessive EEH errors and an undesirable adapter shutdown, some drivers use the pci_channel_offline(dev) wrapper function to return a Boolean value based on the value of pci_dev->error_state to determine if PCI MMIO or DMA accesses are safe. If the wrapper returns TRUE, drivers must not make PCI MMIO or DMA access to their hardware. The pci_dev structure member error_state reflects one of three values, 1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3) pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure. The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the point where the PCI slot is frozen. Currently, the EEH driver restores dev->error_state to pci_channel_io_normal in eeh_report_resume() before calling the driver's resume callback. However, when the EEH driver calls the driver's slot_reset callback() from eeh_report_reset(), it incorrectly indicates the error state is still pci_channel_io_frozen. Waiting until eeh_report_resume() to restore dev->error_state to pci_channel_io_normal is too late for Emulex and QLogic FC drivers and any other drivers which are designed to use common code paths in these two cases: i) those called after the driver's slot_reset callback() and ii) those called after the PCI slot is frozen but before the driver's slot_reset callback is called. Case i) all driver paths executed to reinitialize the hardware after a reset and case ii) all code paths executed by driver kernel threads that run asynchronous to the main driver thread, such as interrupt handlers and worker threads to process driver work queues. Emulex and QLogic FC drivers are designed with common code paths which require that pci_channel_offline(dev) reflect the true state of the hardware. The state transitions that the hardware takes from Normal Operations to Slot Frozen to Reset to Normal Operations are documented in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75. PE State Control. PAPR defines the following 3 states: 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed (Normal Operations) 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled (Slot Frozen) An EEH error places the slot in state 2 (Frozen) and the adapter driver is notified that an EEH error was detected. If the adapter driver returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls eeh_reset_device() to place the slot into state 1 (Reset) and eeh_reset_device completes by placing the slot into State 0 (Normal Operations). Upon return from eeh_reset_device(), the EEH driver calls eeh_report_reset, which then calls the adapter's slot_reset callback. At the time the adapter's slot_reset callback is called, the true state of the hardware is Normal Operations and should be accurately reflected by setting dev->error_state to pci_channel_io_normal. The current implementation of EEH driver does not do so and requires this change to correct this deficiency. Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Acked-by: Linas Vepstas <linasvepstas@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* Merge commit 'origin/master' into nextBenjamin Herrenschmidt2009-03-29
|\ | | | | | | | | | | Manual merge of: arch/powerpc/include/asm/elf.h drivers/i2c/busses/i2c-mpc.c
| * Merge branch 'linus' into x86/apicIngo Molnar2009-02-13
| |\ | | | | | | | | | | | | | | | Conflicts: arch/x86/kernel/acpi/boot.c arch/x86/mm/fault.c
| * \ Merge branch 'core/percpu' into x86/coreIngo Molnar2009-01-28
| |\ \ | | | | | | | | | | | | | | | | Conflicts: kernel/irq/handle.c
| | * \ Merge branch 'x86/mm' into core/percpuIngo Molnar2009-01-21
| | |\ \ | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/mm/fault.c
| | * | | irq: update all arches for new irq_descMike Travis2009-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup, update to new cpumask API Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's so access to them should be using the new cpumask API. Signed-off-by: Mike Travis <travis@sgi.com>
* | | | | powerpc: Add write barrier before enabling DTL flagsJeremy Kerr2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we don't enforce any ordering for updates to the lppaca when enabling dtl logging, so we may end up enabling logging before the index fields have been established. This change adds a smp_wmb() before doing the actual enable. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc: Add virtual processor dispatch trace logJeremy Kerr2009-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pseries SPLPAR machines are able to retrieve a log of dispatch and preempt events from the hypervisor. With this information, we can see when and why each dispatch & preempt is occuring. This change adds a set of debugfs files allowing userspace to read this dispatch log. Based on initial patches from Nishanth Aravamudan <nacc@us.ibm.com>. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Failed reconfig notifier chain call cleanupNathan Fontenot2009-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return code from invoking the notifier chain when updating the ibm,dynamic-memory property is not handled properly. In failure cases (rc == NOTIFY_BAD) we should be restoring the original value of the property. In success (rc == NOTIFY_OK) we should be returning zero from the calling routine. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/kconfig: Kill PPC_MULTIPLATFORMBenjamin Herrenschmidt2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't really meaningful anymore. It was basically equivalent to PPC64 || 6xx. This removes it along with the following changes: - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely on 6xx which is what they want anyway. - A new symbol, PPC_BOOK3S, is defined that represent compliance with the "Server" variant of the architecture. This is set when either 6xx or PPC64 is set and open the door for future BOOK3E 64-bit. - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use PPC64 && PPC_BOOK3S - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now used to control the use of prom_init.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: The pseries MSI code depends on EEHMichael Ellerman2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Reject discontiguous/non-zero based MSI-X requestsMichael Ellerman2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no way for us to express to firmware that we want a discontiguous, or non-zero based, range of MSI-X entries. So we must reject such requests. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Implement a quota system for MSIsMichael Ellerman2009-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are hardware limitations on the number of available MSIs, which firmware expresses using a property named "ibm,pe-total-#msi". This property tells us how many MSIs are available for devices below the point in the PCI tree where we find the property. For old firmwares which don't have the property, we assume there are 8 MSIs available per "partitionable endpoint" (PE). The PE can be found using existing EEH code, which uses the methods described in PAPR. For our purposes we want the parent of the node that's identified using this method. When a driver requests n MSIs for a device, we first establish where the "ibm,pe-total-#msi" property above that device is, or we find the PE if the property is not found. In both cases we call this node the "pe_dn". We then count all non-bridge devices below the pe_dn, to establish how many devices in total may need MSIs. The quota is then simply the total available divided by the number of devices, if the request is less than or equal to the quota, the request is fine and we're done. If the request is greater than the quota, we try to determine if there are any "spare" MSIs which we can give to this device. Spare MSIs are found by looking for other devices which can never use their full quota, because their "req#msi(-x)" property is less than the quota. If we find any spare, we divide the spares by the number of devices that could request more than their quota. This ensures the spare MSIs are spread evenly amongst all over-quota requestors. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Return req#msi(-x) if request is largerMichael Ellerman2009-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a driver asks for more MSIs than the devices "req#msi(-x)" property, we currently return -ENOSPC. This doesn't give the driver any chance to make a new request with a number that might work. So if "req#msi(-x)" is less than the request, return its value. To be 100% safe, make sure we return an error if req_msi == 0. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/eeh: Only disable/enable LSI interrupts in EEHMike Mason2009-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EEH code disables and enables interrupts during the device recovery process. This is unnecessary for MSI and MSI-X interrupts because they are effectively disabled by the DMA Stopped state when an EEH error occurs. The current code is also incorrect for MSI-X interrupts. It doesn't take into account that MSI-X interrupts are tracked in a different way than LSI/MSI interrupts. This patch ensures only LSI interrupts are disabled/enabled. Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Acked-by: Linas Vepstas <linasvepstas@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Return the number of MSIs we could allocateMichael Ellerman2009-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we can't allocate the requested number of MSIs, we can still tell the generic code how many we were able to allocate. That can then be passed onto the driver, allowing it to request that many in future, and probably succeeed. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Check for MSI-X also in rtas_msi_pci_irq_fixup()Michael Ellerman2009-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We also need to check that the device isn't using MSI-X in the irq fixup routine, otherwise we might leave MSI-Xs configured at boot. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Add support for ibm,req#msi-xMichael Ellerman2009-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Firmware encodes the number of MSI-X requested by a device in a different property than for MSI. Pull the property name out as a parameter and share the logic for both cases. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Fix MSI-X interrupt queryingMichael Ellerman2009-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to increment i in the loop that queries what interrupts firmware gave us, otherwise we'll incorrectly use the first value over and over. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | powerpc/pseries: Remove write only variable in PCI DLPARMilton Miller2009-02-10
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | Since we never hotplug add an isa bus, we never need to set primary. Delete this write-only variable. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | powerpc: Add missing sparsemem.h includeMichael Neuling2009-02-09
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | arch/powerpc/platforms/pseries/hotplug-memory.c uses remove_section_mapping() but doesn't include sparsemem.h which defines it. This can cause compilation fails for some configs. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | / powerpc: Printing fix for l64 to ll64 conversion: phyp_dump.cStephen Rothwell2009-01-28
| |/ |/| | | | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc: Change u64/s64 to a long long integer typeIngo Molnar2009-01-12
|/ | | | | | | | | | | | | | | | | | | | | | Convert arch/powerpc/ over to long long based u64: -#ifdef __powerpc64__ -# include <asm-generic/int-l64.h> -#else -# include <asm-generic/int-ll64.h> -#endif +#include <asm-generic/int-ll64.h> This will avoid reoccuring spurious warnings in core kernel code that comes when people test on their own hardware. (i.e. x86 in ~98% of the cases) This is what x86 uses and it generally helps keep 64-bit code 32-bit clean too. [Adjusted to not impact user mode (from paulus) - sfr] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge branch 'cpus4096-for-linus-2' of ↵Linus Torvalds2009-01-02
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually
| * cpumask: make irq_set_affinity() take a const struct cpumaskRusty Russell2008-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: change existing irq_chip API Not much point with gentle transition here: the struct irq_chip's setaffinity method signature needs to change. Fortunately, not widely used code, but hits a few architectures. Note: In irq_select_affinity() I save a temporary in by mangling irq_desc[irq].affinity directly. Ingo, does this break anything? (Folded in fix from KOSAKI Motohiro) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: jeremy@xensource.com Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
| * cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and ↵Rusty Russell2008-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpulist_scnprintf to take pointers. Impact: change calling convention of existing cpumask APIs Most cpumask functions started with cpus_: these have been replaced by cpumask_ ones which take struct cpumask pointers as expected. These four functions don't have good replacement names; fortunately they're rarely used, so we just change them over. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: mingo@redhat.com Cc: tony.luck@intel.com Cc: ralf@linux-mips.org Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: cl@linux-foundation.org Cc: srostedt@redhat.com
* | Merge branch 'core-for-linus' of ↵Linus Torvalds2008-12-30
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits) stacktrace: provide save_stack_trace_tsk() weak alias rcu: provide RCU options on non-preempt architectures too printk: fix discarding message when recursion_bug futex: clean up futex_(un)lock_pi fault handling "Tree RCU": scalable classic RCU implementation futex: rename field in futex_q to clarify single waiter semantics x86/swiotlb: add default swiotlb_arch_range_needs_mapping x86/swiotlb: add default phys<->bus conversion x86: unify pci iommu setup and allow swiotlb to compile for 32 bit x86: add swiotlb allocation functions swiotlb: consolidate swiotlb info message printing swiotlb: support bouncing of HighMem pages swiotlb: factor out copy to/from device swiotlb: add arch hook to force mapping swiotlb: allow architectures to override phys<->bus<->phys conversions swiotlb: add comment where we handle the overflow of a dma mask on 32 bit rcu: fix rcutorture behavior during reboot resources: skip sanity check of busy resources swiotlb: move some definitions to header swiotlb: allow architectures to override swiotlb pool allocation ... Fix up trivial conflicts in arch/x86/kernel/Makefile arch/x86/mm/init_32.c include/linux/hardirq.h as per Ingo's suggestions.
| * | "Tree RCU": scalable classic RCU implementationPaul E. McKenney2008-12-18
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs. Although this patch creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU. This patch still handles stress better than does mainline, so I am still calling it ready for inclusion. This patch is against the -tip tree. Nevertheless, experience on an actual 1000+ CPU machine would still be most welcome. Most of the changes noted below were found while creating an rcutiny (which should permit ejecting the current rcuclassic) and while doing detailed line-by-line documentation. Updates from v9 (http://lkml.org/lkml/2008/12/2/334): o Fixes from remainder of line-by-line code walkthrough, including comment spelling, initialization, undesirable narrowing due to type conversion, removing redundant memory barriers, removing redundant local-variable initialization, and removing redundant local variables. I do not believe that any of these fixes address the CPU-hotplug issues that Andi Kleen was seeing, but please do give it a whirl in case the machine is smarter than I am. A writeup from the walkthrough may be found at the following URL, in case you are suffering from terminal insomnia or masochism: http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf o Made rcutree tracing use seq_file, as suggested some time ago by Lai Jiangshan. o Added a .csv variant of the rcudata debugfs trace file, to allow people having thousands of CPUs to drop the data into a spreadsheet. Tested with oocalc and gnumeric. Updated documentation to suit. Updates from v8 (http://lkml.org/lkml/2008/11/15/139): o Fix a theoretical race between grace-period initialization and force_quiescent_state() that could occur if more than three jiffies were required to carry out the grace-period initialization. Which it might, if you had enough CPUs. o Apply Ingo's printk-standardization patch. o Substitute local variables for repeated accesses to global variables. o Fix comment misspellings and redundant (but harmless) increments of ->n_rcu_pending (this latter after having explicitly added it). o Apply checkpatch fixes. Updates from v7 (http://lkml.org/lkml/2008/10/10/291): o Fixed a number of problems noted by Gautham Shenoy, including the cpu-stall-detection bug that he was having difficulty convincing me was real. ;-) o Changed cpu-stall detection to wait for ten seconds rather than three in order to reduce false positive, as suggested by Ingo Molnar. o Produced a design document (http://lwn.net/Articles/305782/). The act of writing this document uncovered a number of both theoretical and "here and now" bugs as noted below. o Fix dynticks_nesting accounting confusion, simplify WARN_ON() condition, fix kerneldoc comments, and add memory barriers in dynticks interface functions. o Add more data to tracing. o Remove unused "rcu_barrier" field from rcu_data structure. o Count calls to rcu_pending() from scheduling-clock interrupt to use as a surrogate timebase should jiffies stop counting. o Fix a theoretical race between force_quiescent_state() and grace-period initialization. Yes, initialization does have to go on for some jiffies for this race to occur, but given enough CPUs... Updates from v6 (http://lkml.org/lkml/2008/9/23/448): o Fix a number of checkpatch.pl complaints. o Apply review comments from Ingo Molnar and Lai Jiangshan on the stall-detection code. o Fix several bugs in !CONFIG_SMP builds. o Fix a misspelled config-parameter name so that RCU now announces at boot time if stall detection is configured. o Run tests on numerous combinations of configurations parameters, which after the fixes above, now build and run correctly. Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line): o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a changeset some time ago, and finally got around to retesting this option). o Fix some tracing bugs in rcupreempt that caused incorrect totals to be printed. o I now test with a more brutal random-selection online/offline script (attached). Probably more brutal than it needs to be on the people reading it as well, but so it goes. o A number of optimizations and usability improvements: o Make rcu_pending() ignore the grace-period timeout when there is no grace period in progress. o Make force_quiescent_state() avoid going for a global lock in the case where there is no grace period in progress. o Rearrange struct fields to improve struct layout. o Make call_rcu() initiate a grace period if RCU was idle, rather than waiting for the next scheduling clock interrupt. o Invoke rcu_irq_enter() and rcu_irq_exit() only when idle, as suggested by Andi Kleen. I still don't completely trust this change, and might back it out. o Make CONFIG_RCU_TRACE be the single config variable manipulated for all forms of RCU, instead of the prior confusion. o Document tracing files and formats for both rcupreempt and rcutree. Updates from v4 for those missing v5 given its bad subject line: o Separated dynticks interface so that NMIs and irqs call separate functions, greatly simplifying it. In particular, this code no longer requires a proof of correctness. ;-) o Separated dynticks state out into its own per-CPU structure, avoiding the duplicated accounting. o The case where a dynticks-idle CPU runs an irq handler that invokes call_rcu() is now correctly handled, forcing that CPU out of dynticks-idle mode. o Review comments have been applied (thank you all!!!). For but one example, fixed the dynticks-ordering issue that Manfred pointed out, saving me much debugging. ;-) o Adjusted rcuclassic and rcupreempt to handle dynticks changes. Attached is an updated patch to Classic RCU that applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. This passes 10-hour concurrent rcutorture and online-offline testing on 128-CPU ppc64 without dynticks enabled, and exposes some timekeeping bugs in presence of dynticks (exciting working on a system where "sleep 1" hangs until interrupted...), which were fixed in the 2.6.27 kernel. It is getting more reliable than mainline by some measures, so the next version will be against -tip for inclusion. See also Manfred Spraul's recent patches (or his earlier work from 2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2). We will converge onto a common patch in the fullness of time, but are currently exploring different regions of the design space. That said, I have already gratefully stolen quite a few of Manfred's ideas. This patch provides CONFIG_RCU_FANOUT, which controls the bushiness of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on 64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT, there is no hierarchy. By default, the RCU initialization code will adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable this balancing, allowing the hierarchy to be exactly aligned to the underlying hardware. Up to two levels of hierarchy are permitted (in addition to the root node), allowing up to 16,384 CPUs on 32-bit systems and up to 262,144 CPUs on 64-bit systems. I just know that I am going to regret saying this, but this seems more than sufficient for the foreseeable future. (Some architectures might wish to set CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs. If this becomes a real problem, additional levels can be added, but I doubt that it will make a significant difference on real hardware.) In the common case, a given CPU will manipulate its private rcu_data structure and the rcu_node structure that it shares with its immediate neighbors. This can reduce both lock and memory contention by multiple orders of magnitude, which should eliminate the need for the strange manipulations that are reported to be required when running Linux on very large systems. Some shortcomings: o More bugs will probably surface as a result of an ongoing line-by-line code inspection. Patches will be provided as required. o There are probably hangs, rcutorture failures, &c. Seems quite stable on a 128-CPU machine, but that is kind of small compared to 4096 CPUs. However, seems to do better than mainline. Patches will be provided as required. o The memory footprint of this version is several KB larger than rcuclassic. A separate UP-only rcutiny patch will be provided, which will reduce the memory footprint significantly, even compared to the old rcuclassic. One such patch passes light testing, and has a memory footprint smaller even than rcuclassic. Initial reaction from various embedded guys was "it is not worth it", so am putting it aside. Credits: o Manfred Spraul for ideas, review comments, and bugs spotted, as well as some good friendly competition. ;-) o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton for reviews and comments. o Thomas Gleixner for much-needed help with some timer issues (see patches below). o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos, Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines alive despite my heavy abuse^Wtesting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>