aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* PM / domains: Add perf_state attribute to genpd debugfsRajendra Nayak2018-05-30
| | | | | | | | | | | | Now that genpd supports performance states, add this additional attribute as part of the power domains debugfs entry, to display the current performance state for the Power domain. Suggested-by: David Collins <collinsd@codeaurora.org> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki2018-05-30
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Pull more OPP updates for v4.18 from Viresh Kumar. * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: OPP: Allow same OPP table to be used for multiple genpd PM / OPP: Fix shared OPP table support in dev_pm_opp_register_set_opp_helper() PM / OPP: Fix shared OPP table support in dev_pm_opp_set_regulators() PM / OPP: Fix shared OPP table support in dev_pm_opp_set_prop_name() PM / OPP: Fix shared OPP table support in dev_pm_opp_set_supported_hw()
| * OPP: Allow same OPP table to be used for multiple genpdViresh Kumar2018-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OPP binding says: Property: operating-points-v2 ... This can contain more than one phandle for power domain providers that provide multiple power domains. That is, one phandle for each power domain. If only one phandle is available, then the same OPP table will be used for all power domains provided by the power domain provider. But the OPP core isn't allowing the same OPP table to be used for multiple domains. Update dev_pm_opp_of_add_table_indexed() to allow that. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Rajendra Nayak <rnayak@codeaurora.org>
| * PM / OPP: Fix shared OPP table support in dev_pm_opp_register_set_opp_helper()Viresh Kumar2018-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It should be fine to call dev_pm_opp_register_set_opp_helper() for all possible CPUs, even if some of them share the OPP table as the caller may not be aware of sharing policy. Lets increment the reference count of the OPP table and return its pointer. The caller need to call dev_pm_opp_register_put_opp_helper() the same number of times later on to drop all the references. To avoid adding another counter to count how many times dev_pm_opp_register_set_opp_helper() is called for the same OPP table, dev_pm_opp_register_put_opp_helper() frees the resources on the very first call made to it, assuming that the caller would be calling it sequentially for all the CPUs. We can revisit that if that assumption is broken in the future. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
| * PM / OPP: Fix shared OPP table support in dev_pm_opp_set_regulators()Viresh Kumar2018-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It should be fine to call dev_pm_opp_set_regulators() for all possible CPUs, even if some of them share the OPP table as the caller may not be aware of sharing policy. Lets increment the reference count of the OPP table and return its pointer. The caller need to call dev_pm_opp_put_regulators() the same number of times later on to drop all the references. To avoid adding another counter to count how many times dev_pm_opp_set_regulators() is called for the same OPP table, dev_pm_opp_put_regulators() frees the resources on the very first call made to it, assuming that the caller would be calling it sequentially for all the CPUs. We can revisit that if that assumption is broken in the future. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
| * PM / OPP: Fix shared OPP table support in dev_pm_opp_set_prop_name()Viresh Kumar2018-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It should be fine to call dev_pm_opp_set_prop_name() for all possible CPUs, even if some of them share the OPP table as the caller may not be aware of sharing policy. Lets increment the reference count of the OPP table and return its pointer. The caller need to call dev_pm_opp_put_prop_name() the same number of times later on to drop all the references. To avoid adding another counter to count how many times dev_pm_opp_set_prop_name() is called for the same OPP table, dev_pm_opp_put_prop_name() frees the resources on the very first call made to it, assuming that the caller would be calling it sequentially for all the CPUs. We can revisit that if that assumption is broken in the future. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
| * PM / OPP: Fix shared OPP table support in dev_pm_opp_set_supported_hw()Viresh Kumar2018-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It should be fine to call dev_pm_opp_set_supported_hw() for all possible CPUs, even if some of them share the OPP table as the caller may not be aware of sharing policy. Lets increment the reference count of the OPP table and return its pointer. The caller need to call dev_pm_opp_put_supported_hw() the same number of times later on to drop all the references. To avoid adding another counter to count how many times dev_pm_opp_set_supported_hw() is called for the same OPP table, dev_pm_opp_put_supported_hw() frees the resources on the very first call made to it, assuming that the caller would be calling it sequentially for all the CPUs. We can revisit that if that assumption is broken in the future. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | PM / Domain: Return 0 on error from of_genpd_opp_to_performance_state()Viresh Kumar2018-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | of_genpd_opp_to_performance_state() should return 0 on errors, as its doc comment describes. While it follows that mostly, it returns a negative error number on one of the failures. Fix that. Fixes: 6e41766a6a50 "PM / Domain: Implement of_genpd_opp_to_performance_state()" Reported-by: Rajendra Nayak <rnayak@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki2018-05-17
|\| | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Merge an OPP fix for v4.18 from Viresh Kumar. * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: PM / OPP: silence an uninitialized variable warning
| * PM / OPP: silence an uninitialized variable warningDan Carpenter2018-05-16
| | | | | | | | | | | | | | | | | | | | Smatch complains that it's possible we print "rate" in the debug output when it hasn't been initialized. It should be zero on that path. Fixes: a1e8c13600bf ("PM / OPP: "opp-hz" is optional for power domains") [ Viresh: Added the Fixes tag ] Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
* | Merge branch 'opp/genpd-pstate-updates' of ↵Rafael J. Wysocki2018-05-14
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull Operating Performance Points (OPP) library changes for v4.18 from Viresh Kumar. * 'opp/genpd-pstate-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: PM / OPP: Remove dev_pm_opp_{un}register_get_pstate_helper() PM / OPP: Get performance state using genpd helper PM / Domain: Implement of_genpd_opp_to_performance_state() PM / Domain: Add support to parse domain's OPP table PM / Domain: Add struct device to genpd PM / OPP: Implement dev_pm_opp_get_of_node() PM / OPP: Implement of_dev_pm_opp_find_required_opp() PM / OPP: Implement dev_pm_opp_of_add_table_indexed() PM / OPP: "opp-hz" is optional for power domains PM / OPP: dt-bindings: Make "opp-hz" optional for power domains PM / OPP: dt-bindings: Rename "required-opp" as "required-opps" soc/tegra: pmc: Don't allocate struct tegra_powergate on stack
| * PM / OPP: Remove dev_pm_opp_{un}register_get_pstate_helper()Viresh Kumar2018-05-09
| | | | | | | | | | | | | | These helpers aren't used anymore, remove them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / OPP: Get performance state using genpd helperViresh Kumar2018-05-09
| | | | | | | | | | | | | | | | The genpd core provides an API now to retrieve the performance state from DT, use that instead of the ->get_pstate() callback. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / Domain: Implement of_genpd_opp_to_performance_state()Viresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements of_genpd_opp_to_performance_state() which can be used from the device drivers or the OPP core to find the performance state encoded in the "required-opps" property of a node. Normally this would be called only once for each OPP of the device for which the OPP table of the device is getting generated. Different platforms may encode the performance state differently using the OPP table (they may simply return value of opp-hz or opp-microvolt, or apply some algorithm on top of those values) and so a new callback ->opp_to_performance_state() is implemented to allow platform specific drivers to convert the power domain OPP to a performance state value. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / Domain: Add support to parse domain's OPP tableViresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic power domains can have an OPP table for themselves now, and phandle of their OPP nodes can be used by the devices powered by the domain. In order for the OPP core to translate requirements between the devices and their power domains, both need to have an OPP table in kernel. Parse the OPP table for power domains if they have their set_performance_state() callback set. With this patch, an OPP table would be created for the genpd in kernel based on the OPP table present in DT, if the genpd have its set_performance_state() callback set. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / Domain: Add struct device to genpdViresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | | | The power-domain core would be using the OPP core going forward and the OPP core has the basic requirement of a device structure for its working. Add a struct device to the genpd structure. This doesn't register the device with device core as the "dev" pointer is mostly used by the OPP core as a cookie for now and registering the device is not mandatory. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / OPP: Implement dev_pm_opp_get_of_node()Viresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | This adds a new helper to let the power domain drivers to access opp->np, so that they can read platform specific properties from the node. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / OPP: Implement of_dev_pm_opp_find_required_opp()Viresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | A device's DT node or its OPP nodes can contain a phandle to other device's OPP node, in the "required-opps" property. This patch implements a routine to find that required OPP from the node that contains the "required-opps" property. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / OPP: Implement dev_pm_opp_of_add_table_indexed()Viresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | The "operating-points-v2" property can contain a list of phandles now, specifically for the power domain providers that provide multiple domains. Add support to parse that. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / OPP: "opp-hz" is optional for power domainsViresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "opp-hz" property is optional for power domains now and we shouldn't error out if it is missing for power domains. This patch creates two new routines, _get_opp_count() and _opp_is_duplicate(), by separating existing code from their parent functions. Also skip duplicate OPP check for power domain OPPs as they may not have any the "opp-hz" field, but a platform specific performance state binding to uniquely identify OPP nodes. By default the debugfs OPP nodes are named using the "rate" value, but that isn't possible for the power domain OPP nodes and hence they use the index of the OPP node in the OPP node list instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
| * PM / OPP: dt-bindings: Make "opp-hz" optional for power domainsViresh Kumar2018-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | The "opp-hz" property is not relevant across all the devices that use the OPP tables now. For example, for a power domain a frequency value wouldn't mean anything. Though they must have another property, which may be implementation defined, which uniquely identifies the OPP nodes. Make "opp-hz" optional for such devices. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org>
| * PM / OPP: dt-bindings: Rename "required-opp" as "required-opps"Viresh Kumar2018-05-08
| | | | | | | | | | | | | | | | | | | | This property can contain more than one phandle and it must be named "required-opps" instead. Suggested-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org>
| * soc/tegra: pmc: Don't allocate struct tegra_powergate on stackViresh Kumar2018-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | With a later commit an instance of the struct device will be added to struct genpd and with that the size of the struct tegra_powergate will be over 1024 bytes. That generates following warning: drivers/soc/tegra/pmc.c:579:1: warning: the frame size of 1200 bytes is larger than 1024 bytes [-Wframe-larger-than=] Avoid such warnings by allocating the structure dynamically. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Thierry Reding <treding@nvidia.com>
* | spi: Respect all error codes from dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | | | The limitation of being able to check only for -EPROBE_DEFER from dev_pm_domain_attach() has been removed. Hence let's respect all error codes and bail out accordingly. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | soundwire: Respect all error codes from dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | | | The limitation of being able to check only for -EPROBE_DEFER from dev_pm_domain_attach() has been removed. Hence let's respect all error codes and bail out accordingly. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | mmc: sdio: Respect all error codes from dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | The limitation of being able to check only for -EPROBE_DEFER from dev_pm_domain_attach() has been removed. Hence let's respect all error codes and bail out accordingly. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | i2c: Respect all error codes from dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | | | The limitation of being able to check only for -EPROBE_DEFER from dev_pm_domain_attach() has been removed. Hence let's respect all error codes and bail out accordingly. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | driver core: Respect all error codes from dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | | | The limitation of being able to check only for -EPROBE_DEFER from dev_pm_domain_attach() has been removed. Hence let's respect all error codes and bail out accordingly. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | amba: Respect all error codes from dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | The limitation of being able to check only for -EPROBE_DEFER from dev_pm_domain_attach() has been removed. Hence let's respect all error codes and bail out accordingly. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | PM / Domains: Allow a better error handling of dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The callers of dev_pm_domain_attach() currently checks the returned error code for -EPROBE_DEFER and needs to ignore other error codes. This is an unnecessary limitation, which also leads to a rather strange behaviour in the error path. Address this limitation, by changing the return codes from acpi_dev_pm_attach() and genpd_dev_pm_attach(). More precisely, let them return 0, when no PM domain is needed for the device and then return 1, in case the device was successfully attached to its PM domain. In this way, dev_pm_domain_attach(), gets a better understanding of what happens in the attach attempts and also allowing its caller to better act on real errors codes. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | PM / Domains: Check for existing PM domain in dev_pm_domain_attach()Ulf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | | | Instead of checking if an existing PM domain pointer has been assigned in genpd_dev_pm_attach() and acpi_dev_pm_attach(), move the check to the common path in dev_pm_domain_attach(), thus potentially avoid one unnecessary check. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | PM / Domains: Drop redundant code in genpd while attaching devicesUlf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | The driver core together with the PM core, nowadays deals with deferring all probes during the device system sleep phases. Therefore genpd no longer need to care about this situation, so let's drop the corresponding code. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | PM / Domains: Drop comment in genpd about legacy Samsung DT bindingUlf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | The parsing of the Samsung specific DT binding is gone, but the comment in the function header remained. Let's drop the comment to avoid confusions. Fixes: 001d50c9a14f (PM / Domains: Remove obsolete "samsung,power-domain" check) Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | PM / Domains: Fix error path during attach in genpdUlf Hansson2018-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case the PM domain fails to be powered on in genpd_dev_pm_attach(), it returns -EPROBE_DEFER, but keeping the device attached to its PM domain. This leads to problems when the next attempt to attach is re-tried. More precisely, in that situation an -EEXIST error code is returned, because the device already has its PM domain pointer assigned, from the first attempt. Now, because of the sloppy error handling by the existing callers of dev_pm_domain_attach(), probing is allowed to continue when -EEXIST is returned. However, in such case there are no guarantees that the PM domain is powered on by genpd, which may lead to hangs when buses/drivers tried to access their devices. Let's fix this behaviour, simply by detaching the device when powering on fails in genpd_dev_pm_attach(). Cc: v4.11+ <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | Linux 4.17-rc5Linus Torvalds2018-05-13
| |
* | Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2018-05-13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti updates from Thomas Gleixner: "A mixed bag of fixes and updates for the ghosts which are hunting us. The scheduler fixes have been pulled into that branch to avoid conflicts. - A set of fixes to address a khread_parkme() race which caused lost wakeups and loss of state. - A deadlock fix for stop_machine() solved by moving the wakeups outside of the stopper_lock held region. - A set of Spectre V1 array access restrictions. The possible problematic spots were discuvered by Dan Carpenters new checks in smatch. - Removal of an unused file which was forgotten when the rest of that functionality was removed" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Remove unused file perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] sched/core: Introduce set_special_state() kthread, sched/wait: Fix kthread_parkme() completion issue kthread, sched/wait: Fix kthread_parkme() wait-loop sched/fair: Fix the update of blocked load when newly idle stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
| * | x86/vdso: Remove unused fileJann Horn2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit da861e18eccc ("x86, vdso: Get rid of the fake section mechanism") left this file behind; nothing is using it anymore. Signed-off-by: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Link: http://lkml.kernel.org/r/20180504175935.104085-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msrPeter Zijlstra2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap) Userspace controls @attr, sanitize cfg (attr->config) before using it to index an array. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driverPeter Zijlstra2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > arch/x86/events/msr.c:178 msr_event_init() warn: potential spectre issue 'msr' (local cap) Userspace controls @attr, sanitize cfg (attr->config) before using it to index an array. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()Peter Zijlstra2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap) > arch/x86/events/intel/core.c:337 intel_pmu_event_map() warn: potential spectre issue 'intel_perfmon_event_map' > arch/x86/events/intel/knc.c:122 knc_pmu_event_map() warn: potential spectre issue 'knc_perfmon_event_map' > arch/x86/events/intel/p4.c:722 p4_pmu_event_map() warn: potential spectre issue 'p4_general_events' > arch/x86/events/intel/p6.c:116 p6_pmu_event_map() warn: potential spectre issue 'p6_perfmon_event_map' > arch/x86/events/amd/core.c:132 amd_pmu_event_map() warn: potential spectre issue 'amd_perfmon_event_map' Userspace controls @attr, sanitize @attr->config before passing it on to x86_pmu::event_map(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*Peter Zijlstra2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids[cache_type]' (local cap) > arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids' (local cap) > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs[cache_type]' (local cap) > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs' (local cap) Userspace controls @config which contains 3 (byte) fields used for a 3 dimensional array deref. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]Peter Zijlstra2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > kernel/events/ring_buffer.c:871 perf_mmap_to_page() warn: potential spectre issue 'rb->aux_pages' Userspace controls @pgoff through the fault address. Sanitize the array index before doing the array dereference. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]Peter Zijlstra2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > kernel/sched/autogroup.c:230 proc_sched_autogroup_set_nice() warn: potential spectre issue 'sched_prio_to_weight' Userspace controls @nice, sanitize the array index. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]Peter Zijlstra2018-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > kernel/sched/core.c:6921 cpu_weight_nice_write_s64() warn: potential spectre issue 'sched_prio_to_weight' Userspace controls @nice, so sanitize the value before using it to index an array. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/core: Introduce set_special_state()Peter Zijlstra2018-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gaurav reported a perceived problem with TASK_PARKED, which turned out to be a broken wait-loop pattern in __kthread_parkme(), but the reported issue can (and does) in fact happen for states that do not do condition based sleeps. When the 'current->state = TASK_RUNNING' store of a previous (concurrent) try_to_wake_up() collides with the setting of a 'special' sleep state, we can loose the sleep state. Normal condition based wait-loops are immune to this problem, but for sleep states that are not condition based are subject to this problem. There already is a fix for TASK_DEAD. Abstract that and also apply it to TASK_STOPPED and TASK_TRACED, both of which are also without condition based wait-loop. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | kthread, sched/wait: Fix kthread_parkme() completion issuePeter Zijlstra2018-05-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even with the wait-loop fixed, there is a further issue with kthread_parkme(). Upon hotplug, when we do takedown_cpu(), smpboot_park_threads() can return before all those threads are in fact blocked, due to the placement of the complete() in __kthread_parkme(). When that happens, sched_cpu_dying() -> migrate_tasks() can end up migrating such a still runnable task onto another CPU. Normally the task will have hit schedule() and gone to sleep by the time we do kthread_unpark(), which will then do __kthread_bind() to re-bind the task to the correct CPU. However, when we loose the initial TASK_PARKED store to the concurrent wakeup issue described previously, do the complete(), get migrated, it is possible to either: - observe kthread_unpark()'s clearing of SHOULD_PARK and terminate the park and set TASK_RUNNING, or - __kthread_bind()'s wait_task_inactive() to observe the competing TASK_RUNNING store. Either way the WARN() in __kthread_bind() will trigger and fail to correctly set the CPU affinity. Fix this by only issuing the complete() when the kthread has scheduled out. This does away with all the icky 'still running' nonsense. The alternative is to promote TASK_PARKED to a special state, this guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING and we'll end up doing the right thing, but this preserves the whole icky business of potentially migating the still runnable thing. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | kthread, sched/wait: Fix kthread_parkme() wait-loopPeter Zijlstra2018-05-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gaurav reported a problem with __kthread_parkme() where a concurrent try_to_wake_up() could result in competing stores to ->state which, when the TASK_PARKED store got lost bad things would happen. The comment near set_current_state() actually mentions this competing store, but only mentions the case against TASK_RUNNING. This same store, with different timing, can happen against a subsequent !RUNNING store. This normally is not a problem, because as per that same comment, the !RUNNING state store is inside a condition based wait-loop: for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (!need_sleep) break; schedule(); } __set_current_state(TASK_RUNNING); If we loose the (first) TASK_UNINTERRUPTIBLE store to a previous (concurrent) wakeup, the schedule() will NO-OP and we'll go around the loop once more. The problem here is that the TASK_PARKED store is not inside the KTHREAD_SHOULD_PARK condition wait-loop. There is a genuine issue with sleeps that do not have a condition; this is addressed in a subsequent patch. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/fair: Fix the update of blocked load when newly idleVincent Guittot2018-05-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit: 31e77c93e432 ("sched/fair: Update blocked load when newly idle") ... we release the rq->lock when updating blocked load of idle CPUs. This opens a time window during which another CPU can add a task to this CPU's cfs_rq. The check for newly added task of idle_balance() is not in the common path. Move the out label to include this check. Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 31e77c93e432 ("sched/fair: Update blocked load when newly idle") Link: http://lkml.kernel.org/r/20180426103133.GA6953@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlockPeter Zijlstra2018-05-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Matt reported the following deadlock: CPU0 CPU1 schedule(.prev=migrate/0) <fault> pick_next_task() ... idle_balance() migrate_swap() active_balance() stop_two_cpus() spin_lock(stopper0->lock) spin_lock(stopper1->lock) ttwu(migrate/0) smp_cond_load_acquire() -- waits for schedule() stop_one_cpu(1) spin_lock(stopper1->lock) -- waits for stopper lock Fix this deadlock by taking the wakeups out from under stopper->lock. This allows the active_balance() to queue the stop work and finish the context switch, which in turn allows the wakeup from migrate_swap() to observe the context and complete the wakeup. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2018-05-13
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Revert the new NUMA aware placement approach which turned out to create more problems than it solved" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"