aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* AMD IOMMU: add necessary header defines for stats countingJoerg Roedel2009-01-03
| | | | | | Impact: add defines to make iommu stats collection configurable Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add Kconfig entry for statistic collection codeJoerg Roedel2009-01-03
| | | | | | Impact: adds new Kconfig entry Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: use dev_name in iommu_enable functionJoerg Roedel2009-01-03
| | | | | | Impact: cleanup Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: use calc_devid in prealloc_protection_domainsJoerg Roedel2009-01-03
| | | | | | Impact: cleanup Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: convert amd_iommu_isolate to boolJoerg Roedel2009-01-03
| | | | | | Impact: cleanup Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: convert iommu->need_sync to boolJoerg Roedel2009-01-03
| | | | | | Impact: use bool instead of int for iommu->need_sync Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: use dev_name instead of self-build print_devidJoerg Roedel2009-01-03
| | | | | | Impact: use generic dev_name instead of own function Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: allocate a new protection for hotplugged devicesJoerg Roedel2009-01-03
| | | | | | Impact: also hotplug devices benefit from device isolation Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add a domain flag for default domainsJoerg Roedel2009-01-03
| | | | | | Impact: adds a new protection domain flag Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: register functions for the IOMMU APIJoerg Roedel2009-01-03
| | | | Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add domain address lookup function for IOMMU APIJoerg Roedel2009-01-03
| | | | | | Impact: add a generic function to lockup addresses in protection domains Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add domain unmap function for IOMMU APIJoerg Roedel2009-01-03
| | | | | | Impact: add a generic function to unmap pages into protection domains Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add domain map function for IOMMU APIJoerg Roedel2009-01-03
| | | | | | Impact: add a generic function to map pages into protection domains Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add device attach function for IOMMU APIJoerg Roedel2009-01-03
| | | | | | Impact: add a generic function to attach devices to protection domains Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add device detach function for IOMMU APIJoerg Roedel2009-01-03
| | | | | | Impact: add a generic function to detach devices from protection domains Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add domain destroy function for IOMMU APIJoerg Roedel2009-01-03
| | | | | | Impact: add a generic function for releasing protection domains Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add domain init function for IOMMU APIJoerg Roedel2009-01-03
| | | | | | Impact: add a generic function for allocation protection domains Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add domain cleanup helper functionJoerg Roedel2009-01-03
| | | | | | Impact: add a function to remove all devices from a domain Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add device notifier callbackJoerg Roedel2009-01-03
| | | | | | Impact: inform IOMMU about state change of a device in the driver core Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add device detach helper functionsJoerg Roedel2009-01-03
| | | | | | Impact: add helper functions to detach a device from a domain Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: rename set_device_domain functionJoerg Roedel2009-01-03
| | | | | | Impact: rename set_device_domain() to attach_device() Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add device reference counting for protection domainsJoerg Roedel2009-01-03
| | | | | | Impact: know how many devices are assigned to a domain Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add checks for dma_ops domain to dma_ops functionsJoerg Roedel2009-01-03
| | | | | | Impact: detect when a driver uses a device assigned otherwise Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add protection domain flagsJoerg Roedel2009-01-03
| | | | | | | | | | Imapct: add a new struct member to 'struct protection_domain' When using protection domains for dma_ops and KVM its better to know for which subsystem it was allocated. Add a flags member to struct protection domain for that purpose. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add iommu_flush_domain functionJoerg Roedel2009-01-03
| | | | | | Impact: add a function to flush a domain id on every IOMMU Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: don't remove protection domain from iommu_pd_listJoerg Roedel2009-01-03
| | | | | | | | | | Impact: save unneeded logic to add and remove domains to the list The removal of a protection domain from the iommu_pd_list is not necessary. Another benefit is that we save complexity because we don't have to readd it later when the device no longer uses the domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: move invalidation command building to a separate functionJoerg Roedel2009-01-03
| | | | | | Impact: refactoring of iommu_queue_inv_iommu_pages Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: refactor completion wait handling into separate functionsJoerg Roedel2009-01-03
| | | | | | | | | Impact: split one function into three The separate functions are required synchronize commands across all hardware IOMMUs in the system. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: add domain id free functionJoerg Roedel2009-01-03
| | | | | | Impact: add code to release a domain id Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: make dma_ops_free_pagetable genericJoerg Roedel2009-01-03
| | | | | | | | | | Impact: change code to free pagetables from protection domains The dma_ops_free_pagetable function can only free pagetables from dma_ops domains. Change that to free pagetables of pure protection domains. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* AMD IOMMU: rename iommu_map to iommu_map_pageJoerg Roedel2009-01-03
| | | | | | | | | Impact: function rename The iommu_map function maps only one page. Make this clear in the function name. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* Disallow gcc versions 3.{0,1}Ingo Molnar2009-01-02
| | | | | | | | | | GCC 3.0 and 3.1 are too old to build a working kernel. Signed-off-by: Ingo Molnar <mingo@elte.hu> [ This check got dropped as obsolete when I simplified the gcc header inclusion mess in f153b82121b0366fe0e5f9553545cce237335175, but Willy Tarreau reports actually having those old versions still.. -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'cpus4096-for-linus-2' of ↵Linus Torvalds2009-01-02
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually
| * x86: export vector_used_by_percpu_irqIngo Molnar2008-12-23
| | | | | | | | | | | | | | | | | | | | | | | | Impact: build fix lguest can be built as a module and makes use of this new symbol: ERROR: "vector_used_by_percpu_irq" [drivers/lguest/lg.ko] undefined! export it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()Suresh Siddha2008-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These commits: commit 95d313cf1c1ecedc8bec5727b09bdacbf67dfc45 Author: Mike Travis <travis@sgi.com> Date: Tue Dec 16 17:33:54 2008 -0800 x86: Add cpu_mask_to_apicid_and and commit 6eeb7c5a99434596c5953a95baa17d2f085664e3 Author: Mike Travis <travis@sgi.com> Date: Tue Dec 16 17:33:55 2008 -0800 x86: update add-cpu_mask_to_apicid_and to use struct cpumask* broke interrupt delivery on x2apic platforms. As x2apic cluster mode uses logical delivery mode, we need to use logical apicid instead of physical apicid in x2apic_cpu_mask_to_apicid_and() Impact: fixes the broken interrupt delivery issue on generic x2apic platforms. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Mike Travis <travis@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: nominate preferred wakeup cpu, fixVaidyanathan Srinivasan2008-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrew Morton reported: > kernel/sched.c: In function 'schedule': > kernel/sched.c:3679: warning: 'active_balance' may be used uninitialized in this function > > This warning is correct - the code is buggy. In sched.c load_balance_newidle, there's real potential use of uninitialised variable - fix it. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: fix lguest used_vectors breakage, -v2Yinghai Lu2008-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix lguest, clean up 32-bit lguest used used_vectors to record vectors, but that model of allocating vectors changed and got broken, after we changed vector allocation to a per_cpu array. Try enable that for 64bit, and the array is used for all vectors that are not managed by vector_irq per_cpu array. Also kill system_vectors[], that is now a duplication of the used_vectors bitmap. [ merged in cpus4096 due to io_apic.c cpumask changes. ] [ -v2, fix build failure ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: fix warning in arch/x86/kernel/io_apic.cIngo Molnar2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | this warning: arch/x86/kernel/io_apic.c: In function ‘ir_set_msi_irq_affinity’: arch/x86/kernel/io_apic.c:3373: warning: ‘cfg’ may be used uninitialized in this function triggers because the variable was truly uninitialized. We'd crash on entering this code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: fix warning in kernel/sched.cIngo Molnar2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix cpumask conversion bug this warning: kernel/sched.c: In function ‘find_busiest_group’: kernel/sched.c:3429: warning: passing argument 1 of ‘__first_cpu’ from incompatible pointer type shows that we forgot to convert a new patch to the new cpumask APIs. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: move test_sd_parent() to an SMP section of sched.hIngo Molnar2008-12-19
| | | | | | | | | | | | Impact: build fix Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0Vaidyanathan Srinivasan2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: change task balancing to save power more agressively Add SD_BALANCE_NEWIDLE flag at MC level and CPU level if sched_mc is set. This helps power savings and will not affect performance when sched_mc=0 Ingo and Mike Galbraith have optimised the SD flags by removing SD_BALANCE_NEWIDLE at MC and CPU level. This helps performance but hurts power savings since this slows down task consolidation by reducing the number of times load_balance is run. sched: fine-tune SD_MC_INIT commit 14800984706bf6936bbec5187f736e928be5c218 Author: Mike Galbraith <efault@gmx.de> Date: Fri Nov 7 15:26:50 2008 +0100 sched: re-tune balancing -- revert commit 9fcd18c9e63e325dbd2b4c726623f760788d5aa8 Author: Ingo Molnar <mingo@elte.hu> Date: Wed Nov 5 16:52:08 2008 +0100 This patch selectively enables SD_BALANCE_NEWIDLE flag only when sched_mc is set to 1 or 2. This helps power savings by task consolidation and also does not hurt performance at sched_mc=0 where all power saving optimisations are turned off. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: activate active load balancing in new idle cpusVaidyanathan Srinivasan2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: tweak task balancing to save power more agressively Active load balancing is a process by which migration thread is woken up on the target CPU in order to pull current running task on another package into this newly idle package. This method is already in use with normal load_balance(), this patch introduces this method to new idle cpus when sched_mc is set to POWERSAVINGS_BALANCE_WAKEUP. This logic provides effective consolidation of short running daemon jobs in a almost idle system The side effect of this patch may be ping-ponging of tasks if the system is moderately utilised. May need to adjust the iterations before triggering. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: bias task wakeups to preferred semi-idle packagesVaidyanathan Srinivasan2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: tweak task wakeup to save power more agressively Preferred wakeup cpu (from a semi idle package) has been nominated in find_busiest_group() in the previous patch. Use this information in sched_mc_preferred_wakeup_cpu in function wake_idle() to bias task wakeups if the following conditions are satisfied: - The present cpu that is trying to wakeup the process is idle and waking the target process on this cpu will potentially wakeup a completely idle package - The previous cpu on which the target process ran is also idle and hence selecting the previous cpu may wakeup a semi idle cpu package - The task being woken up is allowed to run in the nominated cpu (cpu affinity and restrictions) Basically if both the current cpu and the previous cpu on which the task ran is idle, select the nominated cpu from semi idle cpu package for running the new task that is waking up. Cache hotness is considered since the actual biasing happens in wake_idle() only if the application is cache cold. This technique will effectively move short running bursty jobs in a mostly idle system. Wakeup biasing for power savings gets automatically disabled if system utilisation increases due to the fact that the probability of finding both this_cpu and prev_cpu idle decreases. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: nominate preferred wakeup cpuVaidyanathan Srinivasan2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: extend load-balancing code (no change in behavior yet) When the system utilisation is low and more cpus are idle, then the process waking up from sleep should prefer to wakeup an idle cpu from semi-idle cpu package (multi core package) rather than a completely idle cpu package which would waste power. Use the sched_mc balance logic in find_busiest_group() to nominate a preferred wakeup cpu. This info can be stored in appropriate sched_domain, but updating this info in all copies of sched_domain is not practical. Hence this information is stored in root_domain struct which is one copy per partitioned sched domain. The root_domain can be accessed from each cpu's runqueue and there is one copy per partitioned sched domain. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: favour lower logical cpu number for sched_mc balanceVaidyanathan Srinivasan2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: change load-balancing direction to match that of irqbalanced Just in case two groups have identical load, prefer to move load to lower logical cpu number rather than the present logic of moving to higher logical number. find_busiest_group() tries to look for a group_leader that has spare capacity to take more tasks and freeup an appropriate least loaded group. Just in case there is a tie and the load is equal, then the group with higher logical number is favoured. This conflicts with user space irqbalance daemon that will move interrupts to lower logical number if the system utilisation is very low. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: framework for sched_mc/smt_power_savings=NGautham R Shenoy2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: extend range of /sys/devices/system/cpu/sched_mc_power_savings Currently the sched_mc/smt_power_savings variable is a boolean, which either enables or disables topology based power savings. This patch extends the behaviour of the variable from boolean to multivalued, such that based on the value, we decide how aggressively do we want to perform powersavings balance at appropriate sched domain based on topology. Variable levels of power saving tunable would benefit end user to match the required level of power savings vs performance trade-off depending on the system configuration and workloads. This version makes the sched_mc_power_savings global variable to take more values (0,1,2). Later versions can have a single tunable called sched_power_savings instead of sched_{mc,smt}_power_savings. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: convert BALANCE_FOR_xx_POWER to inline functionsVaidyanathan Srinivasan2008-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup BALANCE_FOR_MC_POWER and similar macros defined in sched.h are not constants and have various condition checks and significant amount of code that is not suitable to be contain in a macro. Also there could be side effects on the expressions passed to some of them like test_sd_parent(). This patch converts all complex macros related to power savings balance to inline functions. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: use possible_cpus=NUM to extend the possible cpus allowedMike Travis2008-12-18
| | | | | | | | | | | | | | | | | | | | | | | | Impact: add new boot parameter Use possible_cpus=NUM kernel parameter to extend the number of possible cpus. The ability to HOTPLUG ON cpus that are "possible" but not "present" is dealt with in a later patch. Signed-off-by: Mike Travis <travis@sgi.com>
| * x86: fix cpu_mask_to_apicid_and to include cpu_online_maskMike Travis2008-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix potential APIC crash In determining the destination apicid, there are usually three cpumasks that are considered: the incoming cpumask arg, cfg->domain and the cpu_online_mask. Since we are just introducing the cpu_mask_to_apicid_and function, make sure it includes the cpu_online_mask in it's evaluation. [Added with this patch.] There are two io_apic.c functions that did not previously use the cpu_online_mask: setup_IO_APIC_irq and msi_compose_msg. Both of these simply used cpu_mask_to_apicid(cfg->domain & TARGET_CPUS), and all but one arch (NUMAQ[*]) returns only online cpus in the TARGET_CPUS mask, so the behavior is identical for all cases. [*: NUMAQ bug?] Note that alloc_cpumask_var is only used for the 32-bit cases where it's highly likely that the cpumask set size will be small and therefore CPUMASK_OFFSTACK=n. But if that's not the case, failing the allocate will cause the same return value as the default. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'x86/apic' into cpus4096Ingo Molnar2008-12-18
| |\ | | | | | | | | | | | | | | | This done for conflict prevention: we merge it into the cpus4096 tree because upcoming cpumask changes will touch apic.c that would collide with x86/apic otherwise.