aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* rcu: Fix day-zero grace-period initialization/cleanup racePaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current approach to grace-period initialization is vulnerable to extremely low-probability races. These races stem from the fact that the old grace period is marked completed on the same traversal through the rcu_node structure that is marking the start of the new grace period. This means that some rcu_node structures will believe that the old grace period is still in effect at the same time that other rcu_node structures believe that the new grace period has already started. These sorts of disagreements can result in too-short grace periods, as shown in the following scenario: 1. CPU 0 completes a grace period, but needs an additional grace period, so starts initializing one, initializing all the non-leaf rcu_node structures and the first leaf rcu_node structure. Because CPU 0 is both completing the old grace period and starting a new one, it marks the completion of the old grace period and the start of the new grace period in a single traversal of the rcu_node structures. Therefore, CPUs corresponding to the first rcu_node structure can become aware that the prior grace period has completed, but CPUs corresponding to the other rcu_node structures will see this same prior grace period as still being in progress. 2. CPU 1 passes through a quiescent state, and therefore informs the RCU core. Because its leaf rcu_node structure has already been initialized, this CPU's quiescent state is applied to the new (and only partially initialized) grace period. 3. CPU 1 enters an RCU read-side critical section and acquires a reference to data item A. Note that this CPU believes that its critical section started after the beginning of the new grace period, and therefore will not block this new grace period. 4. CPU 16 exits dyntick-idle mode. Because it was in dyntick-idle mode, other CPUs informed the RCU core of its extended quiescent state for the past several grace periods. This means that CPU 16 is not yet aware that these past grace periods have ended. Assume that CPU 16 corresponds to the second leaf rcu_node structure -- which has not yet been made aware of the new grace period. 5. CPU 16 removes data item A from its enclosing data structure and passes it to call_rcu(), which queues a callback in the RCU_NEXT_TAIL segment of the callback queue. 6. CPU 16 enters the RCU core, possibly because it has taken a scheduling-clock interrupt, or alternatively because it has more than 10,000 callbacks queued. It notes that the second most recent grace period has completed (recall that because it corresponds to the second as-yet-uninitialized rcu_node structure, it cannot yet become aware that the most recent grace period has completed), and therefore advances its callbacks. The callback for data item A is therefore in the RCU_NEXT_READY_TAIL segment of the callback queue. 7. CPU 0 completes initialization of the remaining leaf rcu_node structures for the new grace period, including the structure corresponding to CPU 16. 8. CPU 16 again enters the RCU core, again, possibly because it has taken a scheduling-clock interrupt, or alternatively because it now has more than 10,000 callbacks queued. It notes that the most recent grace period has ended, and therefore advances its callbacks. The callback for data item A is therefore in the RCU_DONE_TAIL segment of the callback queue. 9. All CPUs other than CPU 1 pass through quiescent states. Because CPU 1 already passed through its quiescent state, the new grace period completes. Note that CPU 1 is still in its RCU read-side critical section, still referencing data item A. 10. Suppose that CPU 2 wais the last CPU to pass through a quiescent state for the new grace period, and suppose further that CPU 2 did not have any callbacks queued, therefore not needing an additional grace period. CPU 2 therefore traverses all of the rcu_node structures, marking the new grace period as completed, but does not initialize a new grace period. 11. CPU 16 yet again enters the RCU core, yet again possibly because it has taken a scheduling-clock interrupt, or alternatively because it now has more than 10,000 callbacks queued. It notes that the new grace period has ended, and therefore advances its callbacks. The callback for data item A is therefore in the RCU_DONE_TAIL segment of the callback queue. This means that this callback is now considered ready to be invoked. 12. CPU 16 invokes the callback, freeing data item A while CPU 1 is still referencing it. This scenario represents a day-zero bug for TREE_RCU. This commit therefore ensures that the old grace period is marked completed in all leaf rcu_node structures before a new grace period is marked started in any of them. That said, it would have been insanely difficult to force this race to happen before the grace-period initialization process was preemptible. Therefore, this commit is not a candidate for -stable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Conflicts: kernel/rcutree.c
* rcu: Make rcutree module parameters visible in sysfsPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | The module parameters blimit, qhimark, and qlomark (and more recently, rcu_fanout_leaf) have permission masks of zero, so that their values are not visible from sysfs. This is unnecessary and inconvenient to administrators who might like an easy way to see what these values are on a running system. This commit therefore sets their permission masks to 0444, allowing them to be read but not written. Reported-by: Rusty Russell <rusty@ozlabs.org> Reported-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Control grace-period duration from sysfsPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although almost everyone is well-served by the defaults, some uses of RCU benefit from shorter grace periods, while others benefit more from the greater efficiency provided by longer grace periods. Situations requiring a large number of grace periods to elapse (and wireshark startup has been called out as an example of this) are helped by lower-latency grace periods. Furthermore, in some embedded applications, people are willing to accept a small degradation in update efficiency (due to there being more of the shorter grace-period operations) in order to gain the lower latency. In contrast, those few systems with thousands of CPUs need longer grace periods because the CPU overhead of a grace period rises roughly linearly with the number of CPUs. Such systems normally do not make much use of facilities that require large numbers of grace periods to elapse, so this is a good tradeoff. Therefore, this commit allows the durations to be controlled from sysfs. There are two sysfs parameters, one named "jiffies_till_first_fqs" that specifies the delay in jiffies from the end of grace-period initialization until the first attempt to force quiescent states, and the other named "jiffies_till_next_fqs" that specifies the delay (again in jiffies) between subsequent attempts to force quiescent states. They both default to three jiffies, which is compatible with the old hard-coded behavior. At some future time, it may be possible to automatically increase the grace-period length with the number of CPUs, but we do not yet have sufficient data to do a good job. Preliminary data indicates that we should add an addiitonal jiffy to each of the delays for every 200 CPUs in the system, but more experimentation is needed. For now, the number of systems with more than 1,000 CPUs is small enough that this can be relegated to boot-time hand tuning. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Prevent force_quiescent_state() memory contentionPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | | | | Large systems running RCU_FAST_NO_HZ kernels see extreme memory contention on the rcu_state structure's ->fqslock field. This can be avoided by disabling RCU_FAST_NO_HZ, either at compile time or at boot time (via the nohz kernel boot parameter), but large systems will no doubt become sensitive to energy consumption. This commit therefore uses a combining-tree approach to spread the memory contention across new cache lines in the leaf rcu_node structures. This can be thought of as a tournament lock that has only a try-lock acquisition primitive. The effect on small systems is minimal, because such systems have an rcu_node "tree" consisting of a single node. In addition, this functionality is not used on fastpaths. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Adjust debugfs tracing for kthread-based quiescent-state forcingPaul E. McKenney2012-09-23
| | | | | | | | Moving quiescent-state forcing into a kthread dispenses with the need for the ->n_rp_need_fqs field, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Allow RCU quiescent-state forcing to be preemptedPaul E. McKenney2012-09-23
| | | | | | | | | | | | | RCU quiescent-state forcing is currently carried out without preemption points, which can result in excessive latency spikes on large systems (many hundreds or thousands of CPUs). This patch therefore inserts a voluntary preemption point into force_qs_rnp(), which should greatly reduce the magnitude of these spikes. Reported-by: Mike Galbraith <mgalbraith@suse.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Move quiescent-state forcing into kthreadPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | As the first step towards allowing quiescent-state forcing to be preemptible, this commit moves RCU quiescent-state forcing into the same kthread that is now used to initialize and clean up after grace periods. This is yet another step towards keeping scheduling latency down to a dull roar. Updated to change from raw_spin_lock_irqsave() to raw_spin_lock_irq() and to remove the now-unused rcu_state structure fields as suggested by Peter Zijlstra. Reported-by: Mike Galbraith <mgalbraith@suse.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Segregate rcu_state fields to improve cache localityDimitri Sivanich2012-09-23
| | | | | | | | | | | | | | The fields in the rcu_state structure that are protected by the root rcu_node structure's ->lock can share a cache line with the fields protected by ->onofflock. This can result in excessive memory contention on large systems, so this commit applies ____cacheline_internodealigned_in_smp to the ->onofflock field in order to segregate them. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Dimitri Sivanich <sivanich@sgi.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Provide OOM handler to motivate lazy RCU callbacksPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | | | | | | | In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a large number of lazy callbacks, which as the name implies will be slow to be invoked. This can be a problem on small-memory systems, where the default 6-second sleep for CPUs having only lazy RCU callbacks could well be fatal. This commit therefore installs an OOM hander that ensures that every CPU with lazy callbacks has at least one non-lazy callback, in turn ensuring timely advancement for these callbacks. Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan. Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(), thus reducing the number of IPIs, as suggested by Steven Rostedt. Also to make the for_each_online_cpu() loop be preemptible. (Later, it might be good to use smp_call_function(), as suggested by Peter Zijlstra.) Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Prevent offline CPUs from executing RCU core codePaul E. McKenney2012-09-23
| | | | | | | | | | | | | | Earlier versions of RCU invoked the RCU core from the CPU_DYING notifier in order to note a quiescent state for the outgoing CPU. Because the CPU is marked "offline" during the execution of the CPU_DYING notifiers, the RCU core had to tolerate being invoked from an offline CPU. However, commit b1420f1c (Make rcu_barrier() less disruptive) left only tracing code in the CPU_DYING notifier, so the RCU core need no longer execute on offline CPUs. This commit therefore enforces this restriction. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Break up rcu_gp_kthread() into subfunctionsPaul E. McKenney2012-09-23
| | | | | | | | | Then rcu_gp_kthread() function is too large and furthermore needs to have the force_quiescent_state() code pulled in. This commit therefore breaks up rcu_gp_kthread() into rcu_gp_init() and rcu_gp_cleanup(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Allow RCU grace-period cleanup to be preemptedPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | | RCU grace-period cleanup is currently carried out with interrupts disabled, which can result in excessive latency spikes on large systems (many hundreds or thousands of CPUs). This patch therefore makes the RCU grace-period cleanup be preemptible, including voluntary preemption points, which should eliminate those latency spikes. Similar spikes from forcing of quiescent states will be dealt with similarly by later patches. Updated to replace uses of spin_lock_irqsave() with spin_lock_irq(), as suggested by Peter Zijlstra. Reported-by: Mike Galbraith <mgalbraith@suse.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Move RCU grace-period cleanup into kthreadPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | As a first step towards allowing grace-period cleanup to be preemptible, this commit moves the RCU grace-period cleanup into the same kthread that is now used to initialize grace periods. This is needed to keep scheduling latency down to a dull roar. [ paulmck: Get rid of stray spin_lock_irqsave() calls. ] Reported-by: Mike Galbraith <mgalbraith@suse.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Allow RCU grace-period initialization to be preemptedPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | RCU grace-period initialization is currently carried out with interrupts disabled, which can result in 200-microsecond latency spikes on systems on which RCU has been configured for 4096 CPUs. This patch therefore makes the RCU grace-period initialization be preemptible, which should eliminate those latency spikes. Similar spikes from grace-period cleanup and the forcing of quiescent states will be dealt with similarly by later patches. Reported-by: Mike Galbraith <mgalbraith@suse.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Prevent initialization-time quiescent-state racePaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next step in reducing RCU's grace-period initialization latency on large systems will make this initialization preemptible. Unfortunately, making the grace-period initialization subject to interrupts (let alone preemption) exposes the following race on systems whose rcu_node tree contains more than one node: 1. CPU 31 starts initializing the grace period, including the first leaf rcu_node structures, and is then preempted. 2. CPU 0 refers to the first leaf rcu_node structure, and notes that a new grace period has started. It passes through a quiescent state shortly thereafter, and informs the RCU core of this rite of passage. 3. CPU 0 enters an RCU read-side critical section, acquiring a pointer to an RCU-protected data item. 4. CPU 31 takes an interrupt whose handler removes the data item referenced by CPU 0 from the data structure, and registers an RCU callback in order to free it. 5. CPU 31 resumes initializing the grace period, including its own rcu_node structure. In invokes rcu_start_gp_per_cpu(), which advances all callbacks, including the one registered in #4 above, to be handled by the current grace period. 6. The remaining CPUs pass through quiescent states and inform the RCU core, but CPU 0 remains in its RCU read-side critical section, still referencing the now-removed data item. 7. The grace period completes and all the callbacks are invoked, including the one that frees the data item that CPU 0 is still referencing. Oops!!! One way to avoid this race is to remove grace-period acceleration from rcu_start_gp_per_cpu(). Now, the only reason for this acceleration was to allow CPUs bringing RCU out of idle state to have their callbacks invoked after only one grace period, rather than the two grace periods that would otherwise be required. But this acceleration does not work when RCU grace-period initialization is moved to a kthread because the CPU posting the callback is no longer necessarily the CPU that is initializing the resulting grace period. This commit therefore removes this now-pointless (and soon to be dangerous) grace-period acceleration, thus avoiding the above race. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* rcu: Move RCU grace-period initialization into a kthreadPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | As the first step towards allowing grace-period initialization to be preemptible, this commit moves the RCU grace-period initialization into its own kthread. This is needed to keep large-system scheduling latency at reasonable levels. Also change raw_spin_lock_irqsave() to raw_spin_lock_irq() as suggested by Peter Zijlstra in review comments. Reported-by: Mike Galbraith <mgalbraith@suse.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Fix day-one dyntick-idle stall-warning bugPaul E. McKenney2012-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each grace period is supposed to have at least one callback waiting for that grace period to complete. However, if CONFIG_NO_HZ=n, an extra callback-free grace period is no big problem -- it will chew up a tiny bit of CPU time, but it will complete normally. In contrast, CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to sleep indefinitely, in turn indefinitely delaying completion of the callback-free grace period. Given that nothing is waiting on this grace period, this is also not a problem. That is, unless RCU CPU stall warnings are also enabled, as they are in recent kernels. In this case, if a CPU wakes up after at least one minute of inactivity, an RCU CPU stall warning will result. The reason that no one noticed until quite recently is that most systems have enough OS noise that they will never remain absolutely idle for a full minute. But there are some embedded systems with cut-down userspace configurations that consistently get into this situation. All this begs the question of exactly how a callback-free grace period gets started in the first place. This can happen due to the fact that CPUs do not necessarily agree on which grace period is in progress. If a CPU still believes that the grace period that just completed is still ongoing, it will believe that it has callbacks that need to wait for another grace period, never mind the fact that the grace period that they were waiting for just completed. This CPU can therefore erroneously decide to start a new grace period. Note that this can happen in TREE_RCU and TREE_PREEMPT_RCU even on a single-CPU system: Deadlock considerations mean that the CPU that detected the end of the grace period is not necessarily officially informed of this fact for some time. Once this CPU notices that the earlier grace period completed, it will invoke its callbacks. It then won't have any callbacks left. If no other CPU has any callbacks, we now have a callback-free grace period. This commit therefore makes CPUs check more carefully before starting a new grace period. This new check relies on an array of tail pointers into each CPU's list of callbacks. If the CPU is up to date on which grace periods have completed, it checks to see if any callbacks follow the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks follow the RCU_WAIT_TAIL segment. The reason that this works is that the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment as soon as the CPU is officially notified that the old grace period has ended. This change is to cpu_needs_another_gp(), which is called in a number of places. The only one that really matters is in rcu_start_gp(), where the root rcu_node structure's ->lock is held, which prevents any other CPU from starting or completing a grace period, so that the comparison that determines whether the CPU is missing the completion of a grace period is stable. Reported-by: Becky Bruce <bgillbruce@gmail.com> Reported-by: Subodh Nijsure <snijsure@grid-net.com> Reported-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Walmsley <paul@pwsan.com> # OMAP3730, OMAP4430 Cc: stable@vger.kernel.org
* trace: Don't declare trace_*_rcuidle functions in modulesJosh Triplett2012-09-12
| | | | | | | | | | | | | | | | | | | | | Tracepoints declare a static inline trace_*_rcuidle variant of the trace function, to support safely generating trace events from the idle loop. Module code never actually uses that variant of trace functions, because modules don't run code that needs tracing with RCU idled. However, the declaration of those otherwise unused functions causes the module to reference rcu_idle_exit and rcu_idle_enter, which RCU does not export to modules. To avoid this, don't generate trace_*_rcuidle functions for tracepoints declared in module code. Link: http://lkml.kernel.org/r/20120905062306.GA14756@leaf Reported-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Linux 3.6-rc5Linus Torvalds2012-09-08
|
* Merge branch 'fixes-for-3.6' of ↵Linus Torvalds2012-09-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA-mapping fixes from Marek Szyprowski: "Another set of fixes for ARM dma-mapping subsystem. Commit e9da6e9905e6 replaced custom consistent buffer remapping code with generic vmalloc areas. It however introduced some regressions caused by limited support for allocations in atomic context. This series contains fixes for those regressions. For some subplatforms the default, pre-allocated pool for atomic allocations turned out to be too small, so a function for setting its size has been added. Another set of patches adds support for atomic allocations to IOMMU-aware DMA-mapping implementation. The last part of this pull request contains two fixes for Contiguous Memory Allocator, which relax too strict requirements." * 'fixes-for-3.6' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: ARM: dma-mapping: IOMMU allocates pages from atomic_pool with GFP_ATOMIC ARM: dma-mapping: Introduce __atomic_get_pages() for __iommu_get_pages() ARM: dma-mapping: Refactor out to introduce __in_atomic_pool ARM: dma-mapping: atomic_pool with struct page **pages ARM: Kirkwood: increase atomic coherent pool size ARM: DMA-Mapping: print warning when atomic coherent allocation fails ARM: DMA-Mapping: add function for setting coherent pool size from platform code ARM: relax conditions required for enabling Contiguous Memory Allocator mm: cma: fix alignment requirements for contiguous regions
| * ARM: dma-mapping: IOMMU allocates pages from atomic_pool with GFP_ATOMICHiroshi Doyu2012-08-28
| | | | | | | | | | | | | | | | | | Make use of the same atomic pool as DMA does, and skip a kernel page mapping which can involve sleep'able operations at allocating a kernel page table. Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
| * ARM: dma-mapping: Introduce __atomic_get_pages() for __iommu_get_pages()Hiroshi Doyu2012-08-28
| | | | | | | | | | | | | | | | | | Support atomic allocation in __iommu_get_pages(). Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com> [moved __atomic_get_pages() under #ifdef CONFIG_ARM_DMA_USE_IOMMU to avoid unused fuction warning for no-IOMMU case] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
| * ARM: dma-mapping: Refactor out to introduce __in_atomic_poolHiroshi Doyu2012-08-28
| | | | | | | | | | | | | | Check the given range("start", "size") is included in "atomic_pool" or not. Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
| * ARM: dma-mapping: atomic_pool with struct page **pagesHiroshi Doyu2012-08-28
| | | | | | | | | | | | | | | | | | struct page **pages is necessary to align with non atomic path in __iommu_get_pages(). atomic_pool() has the intialized **pages instead of just *page. Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
| * ARM: Kirkwood: increase atomic coherent pool sizeMarek Szyprowski2012-08-28
| | | | | | | | | | | | | | | | | | | | The default 256 KiB coherent pool may be too small for some of the Kirkwood devices, so increase it to make sure that devices will be able to allocate their buffers with GFP_ATOMIC flag. Suggested-by: Josh Coombs <josh.coombs@gmail.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Jason Cooper <jason@lakedaemon.net>
| * ARM: DMA-Mapping: print warning when atomic coherent allocation failsMarek Szyprowski2012-08-28
| | | | | | | | | | | | | | | | Print a loud warning when system runs out of memory from atomic DMA coherent pool to let users notice the potential problem. Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
| * ARM: DMA-Mapping: add function for setting coherent pool size from platform codeMarek Szyprowski2012-08-28
| | | | | | | | | | | | | | | | | | | | | | Some platforms might require to increase atomic coherent pool to make sure that their device will be able to allocate all their buffers from atomic context. This function can be also used to decrease atomic coherent pool size if coherent allocations are not used for the given sub-platform. Suggested-by: Josh Coombs <josh.coombs@gmail.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
| * ARM: relax conditions required for enabling Contiguous Memory AllocatorMarek Szyprowski2012-08-28
| | | | | | | | | | | | | | | | | | | | | | | | Contiguous Memory Allocator requires only paging and MMU enabled not particular CPU architectures, so there is no need for strict dependency on CPU type. This enables to use CMA on some older ARM v5 systems which also might need large contiguous blocks for the multimedia processing hw modules. Reported-by: Prabhakar Lad <prabhakar.lad@ti.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Prabhakar Lad <prabhakar.lad@ti.com>
| * mm: cma: fix alignment requirements for contiguous regionsMarek Szyprowski2012-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | Contiguous Memory Allocator requires each of its regions to be aligned in such a way that it is possible to change migration type for all pageblocks holding it and then isolate page of largest possible order from the buddy allocator (which is MAX_ORDER-1). This patch relaxes alignment requirements by one order, because MAX_ORDER alignment is not really needed. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> CC: Michal Nazarewicz <mina86@mina86.com> Acked-by: Michal Nazarewicz <mina86@mina86.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-09-08
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem updates from Dmitry Torokhov. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wacom - add support for EMR on Cintiq 24HD touch Input: i8042 - add Gigabyte T1005 series netbooks to noloop table Input: imx_keypad - reset the hardware before enabling Input: edt-ft5x06 - fix build error when compiling wthout CONFIG_DEBUG_FS
| * | Input: wacom - add support for EMR on Cintiq 24HD touchJason Gerecke2012-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for the EMR digitizer on the Cintiq 24HD touch. The EMR digitizer should work identically to that found on the Cintiq 24HD. The touch digitizer is a separate USB device similar to how we split apart some other devices. Signed-off-by: Jason Gerecke <killertofu@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
| * | Input: i8042 - add Gigabyte T1005 series netbooks to noloop tableDmitry Torokhov2012-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They all define their chassis type as "Other" and therefore are not categorized as "laptops" by the driver, which tries to perform AUX IRQ delivery test which fails and causes touchpad not working. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42620 Cc: stable@kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
| * | Input: imx_keypad - reset the hardware before enablingMichael Grzeschik2012-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure the hardware is correctly initialized before requesting the interrupt, otherwise if a key was already touched since power-on the kernel enters an interrupt loop. To fix this issue we clear pending interrupt sources. We also have to make sure clk is enabled while changing the keypad registers. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
| * | Input: edt-ft5x06 - fix build error when compiling wthout CONFIG_DEBUG_FSGuenter Roeck2012-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following breakage: edt-ft5x06.c: In function edt_ft5x06_ts_remove: edt-ft5x06.c:846:14: error: struct edt_ft5x06_ts_data has no member named raw_buffer Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Simon Budig <simon.budig@kernelconcepts.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
* | | Merge branch 'upstream-fixes' of ↵Linus Torvalds2012-09-07
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: "It contains a fix for Eaton Ellipse MAX UPS from Alan Stern, performance improvement (not processing debug data if noone is interested), by Henrik Rydberg, and allowing tpkbd-driven devices to work even with generic driver in a crippled mode, by Andres Freund." * 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured HID: Only dump input if someone is listening HID: add NOGET quirk for Eaton Ellipse MAX UPS
| * | | HID: tpkbd: work even if the new Lenovo Keyboard driver is not configuredAndres Freund2012-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | c1dcad2d32d0252e8a3023d20311b52a187ecda3 added a new driver configured by HID_LENOVO_TPKBD but made the hid_have_special_driver entry non-optional which lead to a recognized but non-working device if the new driver wasn't configured (which is the correct default). Signed-off-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | | HID: Only dump input if someone is listeningHenrik Rydberg2012-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Going through the motions of printing the debug message information takes a long time; using the keyboard can lead to a 160 us irqsoff latency. This patch skips hid_dump_input() when there are no open handles, which brings latency down to 100 us. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | | HID: add NOGET quirk for Eaton Ellipse MAX UPSAlan Stern2012-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch (as1603) adds a NOGET quirk for the Eaton Ellipse MAX UPS device. (The USB IDs were already present in hid-ids.h, apparently under a different name.) Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Laurent Bigonville <l.bigonville@edpnet.be> CC: <stable@vger.kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | | Merge tag 'stable/for-linus-3.6-rc4-tag' of ↵Linus Torvalds2012-09-06
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bug-fixes from Konrad Rzeszutek Wilk: * Fix for TLB flushing introduced in v3.6 * Fix Xen-SWIOTLB not using proper DMA mask - device had 64bit but in a 32-bit kernel we need to allocate for coherent pages from a 32-bit pool. * When trying to re-use P2M nodes we had a one-off error and triggered a BUG_ON check with specific CONFIG_ option. * When doing FLR in Xen-PCI-backend we would first do FLR then save the PCI configuration space. We needed to do it the other way around. * tag 'stable/for-linus-3.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pciback: Fix proper FLR steps. xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory.
| * | | | xen/pciback: Fix proper FLR steps.Konrad Rzeszutek Wilk2012-09-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we do FLR and save PCI config we did it in the wrong order. The end result was that if a PCI device was unbind from its driver, then binded to xen-pciback, and then back to its driver we would get: > lspci -s 04:00.0 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 13:42:12 # 4 :~/ > echo "0000:04:00.0" > /sys/bus/pci/drivers/pciback/unbind > modprobe e1000e e1000e: Intel(R) PRO/1000 Network Driver - 2.0.0-k e1000e: Copyright(c) 1999 - 2012 Intel Corporation. e1000e 0000:04:00.0: Disabling ASPM L0s L1 e1000e 0000:04:00.0: enabling device (0000 -> 0002) xen: registering gsi 48 triggering 0 polarity 1 Already setup the GSI :48 e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode e1000e: probe of 0000:04:00.0 failed with error -2 This fixes it by first saving the PCI configuration space, then doing the FLR. Reported-by: Ren, Yongjie <yongjie.ren@intel.com> Reported-and-Tested-by: Tobias Geiger <tobias.geiger@vido.info> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: stable@vger.kernel.org
| * | | | xen: Use correct masking in xen_swiotlb_alloc_coherent.Ronny Hegewald2012-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running 32-bit pvops-dom0 and a driver tries to allocate a coherent DMA-memory the xen swiotlb-implementation returned memory beyond 4GB. The underlaying reason is that if the supplied driver passes in a DMA_BIT_MASK(64) ( hwdev->coherent_dma_mask is set to 0xffffffffffffffff) our dma_mask will be u64 set to 0xffffffffffffffff even if we set it to DMA_BIT_MASK(32) previously. Meaning we do not reset the upper bits. By using the dma_alloc_coherent_mask function - it does the proper casting and we get 0xfffffffff. This caused not working sound on a system with 4 GB and a 64-bit compatible sound-card with sets the DMA-mask to 64bit. On bare-metal and the forward-ported xen-dom0 patches from OpenSuse a coherent DMA-memory is always allocated inside the 32-bit address-range by calling dma_alloc_coherent_mask. This patch adds the same functionality to xen swiotlb and is a rebase of the original patch from Ronny Hegewald which never got upstream b/c the underlaying reason was not understood until now. The original email with the original patch is in: http://old-list-archives.xen.org/archives/html/xen-devel/2010-02/msg00038.html the original thread from where the discussion started is in: http://old-list-archives.xen.org/archives/html/xen-devel/2010-01/msg00928.html Signed-off-by: Ronny Hegewald <ronny.hegewald@online.de> Signed-off-by: Stefano Panella <stefano.panella@citrix.com> Acked-By: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: stable@vger.kernel.org
| * | | | xen: fix logical error in tlb flushingAlex Shi2012-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While TLB_FLUSH_ALL gets passed as 'end' argument to flush_tlb_others(), the Xen code was made to check its 'start' parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not be flushed from TLB. This patch fixed this issue. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Tested-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | Merge commit '4cb38750d49010ae72e718d46605ac9ba5a851b4' into ↵Konrad Rzeszutek Wilk2012-09-05
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stable/for-linus-3.6 * commit '4cb38750d49010ae72e718d46605ac9ba5a851b4': (6849 commits) bcma: fix invalid PMU chip control masks [libata] pata_cmd64x: whitespace cleanup libata-acpi: fix up for acpi_pm_device_sleep_state API sata_dwc_460ex: device tree may specify dma_channel ahci, trivial: fixed coding style issues related to braces ahci_platform: add hibernation callbacks libata-eh.c: local functions should not be exposed globally libata-transport.c: local functions should not be exposed globally sata_dwc_460ex: support hardreset ata: use module_pci_driver drivers/ata/pata_pcmcia.c: adjust suspicious bit operation pata_imx: Convert to clk_prepare_enable/clk_disable_unprepare ahci: Enable SB600 64bit DMA on MSI K9AGM2 (MS-7327) v2 [libata] Prevent interface errors with Seagate FreeAgent GoFlex drivers/acpi/glue: revert accidental license-related 6b66d95895c bits libata-acpi: add missing inlines in libata.h i2c-omap: Add support for I2C_M_STOP message flag i2c: Fall back to emulated SMBus if the operation isn't supported natively i2c: Add SCCB support i2c-tiny-usb: Add support for the Robofuzz OSIF USB/I2C converter ...
| * | | | | xen/p2m: Fix one-off error in checking the P2M tree directory.Konrad Rzeszutek Wilk2012-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES inclusive) when trying to figure out whether we can re-use some of the P2M middle leafs. Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512 we would try to use the 512th entry. Fortunately for us the p2m_top_index has a check for this: BUG_ON(pfn >= MAX_P2M_PFN); which we hit and saw this: (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1.2-OVM x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff819cadeb>] (XEN) RFLAGS: 0000000000000212 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff81db5000 rbx: ffffffff81db4000 rcx: 0000000000000000 (XEN) rdx: 0000000000480211 rsi: 0000000000000000 rdi: ffffffff81db4000 (XEN) rbp: ffffffff81793db8 rsp: ffffffff81793d38 r8: 0000000008000000 (XEN) r9: 4000000000000000 r10: 0000000000000000 r11: ffffffff81db7000 (XEN) r12: 0000000000000ff8 r13: ffffffff81df1ff8 r14: ffffffff81db6000 (XEN) r15: 0000000000000ff8 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000661795000 cr2: 0000000000000000 Fixes-Oracle-Bug: 14570662 CC: stable@vger.kernel.org # only for v3.5 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | | | Merge tag '3.6-pci-fixes' of ↵Linus Torvalds2012-09-06
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Power management - PCI/PM: Enable D3/D3cold by default for most devices - PCI/PM: Keep parent bridge active when probing device - PCI/PM: Fix config reg access for D3cold and bridge suspending - PCI/PM: Add ABI document for sysfs file d3cold_allowed Core - PCI: Don't print anything while decoding is disabled" * tag '3.6-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Don't print anything while decoding is disabled PCI/PM: Add ABI document for sysfs file d3cold_allowed PCI/PM: Fix config reg access for D3cold and bridge suspending PCI/PM: Keep parent bridge active when probing device PCI/PM: Enable D3/D3cold by default for most devices
| * | | | | | PCI: Don't print anything while decoding is disabledBjorn Helgaas2012-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we try to print to the console device while its decoding is disabled, the system will hang. Reported-and-tested-by: Olof Johansson <olof@lixom.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Olof Johansson <olof@lixom.net>
| * | | | | | PCI/PM: Add ABI document for sysfs file d3cold_allowedHuang Ying2012-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds ABI document for the following sysfs file: /sys/bus/pci/devices/.../d3cold_allowed Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
| * | | | | | PCI/PM: Fix config reg access for D3cold and bridge suspendingHuang Ying2012-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following bug: http://marc.info/?l=linux-pci&m=134338059022620&w=2 Where lspci does not work properly if a device and the corresponding parent bridge (such as PCIe port) is suspended. This is because the device configuration space registers will be not accessible if the corresponding parent bridge is suspended or the device is put into D3cold state. To solve the issue, the bridge/PCIe port connected to the device is put into active state before read/write configuration space registers. If the device is in D3cold state, it will be put into active state too. To avoid resume/suspend PCIe port for each configuration register read/write, a small delay is added before the PCIe port to go suspended. Reported-by: Bjorn Mork <bjorn@mork.no> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
| * | | | | | PCI/PM: Keep parent bridge active when probing deviceHuang Ying2012-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following bug: http://marc.info/?l=linux-pci&m=134329923124234&w=2 The root cause of the bug is as follow. If a device is not bound with the corresponding driver, the device runtime PM will be disabled and the device will be put into suspended state. So that, the bridge/PCIe port connected to it may be put into suspended and low power state. When do probing for the device later, because the bridge/PCIe port connected to it is in low power state, the IO access to device may fail. To solve the issue, the bridge/PCIe port connected to the device is put into active state before probing. Reported-by: Bjorn Mork <bjorn@mork.no> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
| * | | | | | PCI/PM: Enable D3/D3cold by default for most devicesHuang Ying2012-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following bug: http://marc.info/?l=linux-usb&m=134318961120825&w=2 Originally, device lower power states include D1, D2, D3. After that, D3 is further divided into D3hot and D3cold. To support both scenario safely, original D3 is mapped to D3cold. When adding D3cold support, because worry about some device may have broken D3cold support, D3cold is disabled by default. This disable D3 on original platform too. But some original platform may only have working D3, but no working D1, D2. The root cause of the above bug is it too. To deal with this, this patch enables D3/D3cold by default for most devices. This restores the original behavior. For some devices that suspected to have broken D3cold support, such as PCIe port, D3cold is disabled by default. Reported-by: Bjorn Mork <bjorn@mork.no> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>