aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2010-10-21 17:53:17 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2010-10-21 17:53:17 -0400
commita8cbf22559ceefdcdfac00701e8e6da7518b7e8e (patch)
tree63ebd5779a37f809f7daed77dbf27aa3f1e1110c /Documentation
parente36f561a2c88394ef2708f1ab300fe8a79e9f651 (diff)
parent9c034392533f3e9f00656d5c58478cff2560ef81 (diff)
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (26 commits) PM / Wakeup: Show wakeup sources statistics in debugfs PM: Introduce library for device-specific OPPs (v7) PM: Add sysfs attr for rechecking dev hash from PM trace PM: Lock PM device list mutex in show_dev_hash() PM / Runtime: Remove idle notification after failing suspend PM / Hibernate: Modify signature used to mark swap PM / Runtime: Reduce code duplication in core helper functions PM: Allow wakeup events to abort freezing of tasks PM: runtime: add missed pm_request_autosuspend PM / Hibernate: Make some boot messages look less scary PM / Runtime: Implement autosuspend support PM / Runtime: Add no_callbacks flag PM / Runtime: Combine runtime PM entry points PM / Runtime: Merge synchronous and async runtime routines PM / Runtime: Replace boolean arguments with bitflags PM / Runtime: Move code in drivers/base/power/runtime.c sysfs: Add sysfs_merge_group() and sysfs_unmerge_group() PM: Fix potential issue with failing asynchronous suspend PM / Wakeup: Introduce wakeup source objects and event statistics (v3) PM: Fix signed/unsigned warning in dpm_show_time() ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-devices-power88
-rw-r--r--Documentation/ABI/testing/sysfs-power29
-rw-r--r--Documentation/kernel-parameters.txt5
-rw-r--r--Documentation/power/00-INDEX2
-rw-r--r--Documentation/power/interface.txt2
-rw-r--r--Documentation/power/opp.txt375
-rw-r--r--Documentation/power/runtime_pm.txt227
-rw-r--r--Documentation/power/s2ram.txt7
-rw-r--r--Documentation/power/swsusp.txt3
9 files changed, 729 insertions, 9 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power
index 6123c523bfd7..7628cd1bc36a 100644
--- a/Documentation/ABI/testing/sysfs-devices-power
+++ b/Documentation/ABI/testing/sysfs-devices-power
@@ -77,3 +77,91 @@ Description:
77 devices this attribute is set to "enabled" by bus type code or 77 devices this attribute is set to "enabled" by bus type code or
78 device drivers and in that cases it should be safe to leave the 78 device drivers and in that cases it should be safe to leave the
79 default value. 79 default value.
80
81What: /sys/devices/.../power/wakeup_count
82Date: September 2010
83Contact: Rafael J. Wysocki <rjw@sisk.pl>
84Description:
85 The /sys/devices/.../wakeup_count attribute contains the number
86 of signaled wakeup events associated with the device. This
87 attribute is read-only. If the device is not enabled to wake up
88 the system from sleep states, this attribute is empty.
89
90What: /sys/devices/.../power/wakeup_active_count
91Date: September 2010
92Contact: Rafael J. Wysocki <rjw@sisk.pl>
93Description:
94 The /sys/devices/.../wakeup_active_count attribute contains the
95 number of times the processing of wakeup events associated with
96 the device was completed (at the kernel level). This attribute
97 is read-only. If the device is not enabled to wake up the
98 system from sleep states, this attribute is empty.
99
100What: /sys/devices/.../power/wakeup_hit_count
101Date: September 2010
102Contact: Rafael J. Wysocki <rjw@sisk.pl>
103Description:
104 The /sys/devices/.../wakeup_hit_count attribute contains the
105 number of times the processing of a wakeup event associated with
106 the device might prevent the system from entering a sleep state.
107 This attribute is read-only. If the device is not enabled to
108 wake up the system from sleep states, this attribute is empty.
109
110What: /sys/devices/.../power/wakeup_active
111Date: September 2010
112Contact: Rafael J. Wysocki <rjw@sisk.pl>
113Description:
114 The /sys/devices/.../wakeup_active attribute contains either 1,
115 or 0, depending on whether or not a wakeup event associated with
116 the device is being processed (1). This attribute is read-only.
117 If the device is not enabled to wake up the system from sleep
118 states, this attribute is empty.
119
120What: /sys/devices/.../power/wakeup_total_time_ms
121Date: September 2010
122Contact: Rafael J. Wysocki <rjw@sisk.pl>
123Description:
124 The /sys/devices/.../wakeup_total_time_ms attribute contains
125 the total time of processing wakeup events associated with the
126 device, in milliseconds. This attribute is read-only. If the
127 device is not enabled to wake up the system from sleep states,
128 this attribute is empty.
129
130What: /sys/devices/.../power/wakeup_max_time_ms
131Date: September 2010
132Contact: Rafael J. Wysocki <rjw@sisk.pl>
133Description:
134 The /sys/devices/.../wakeup_max_time_ms attribute contains
135 the maximum time of processing a single wakeup event associated
136 with the device, in milliseconds. This attribute is read-only.
137 If the device is not enabled to wake up the system from sleep
138 states, this attribute is empty.
139
140What: /sys/devices/.../power/wakeup_last_time_ms
141Date: September 2010
142Contact: Rafael J. Wysocki <rjw@sisk.pl>
143Description:
144 The /sys/devices/.../wakeup_last_time_ms attribute contains
145 the value of the monotonic clock corresponding to the time of
146 signaling the last wakeup event associated with the device, in
147 milliseconds. This attribute is read-only. If the device is
148 not enabled to wake up the system from sleep states, this
149 attribute is empty.
150
151What: /sys/devices/.../power/autosuspend_delay_ms
152Date: September 2010
153Contact: Alan Stern <stern@rowland.harvard.edu>
154Description:
155 The /sys/devices/.../power/autosuspend_delay_ms attribute
156 contains the autosuspend delay value (in milliseconds). Some
157 drivers do not want their device to suspend as soon as it
158 becomes idle at run time; they want the device to remain
159 inactive for a certain minimum period of time first. That
160 period is called the autosuspend delay. Negative values will
161 prevent the device from being suspended at run time (similar
162 to writing "on" to the power/control attribute). Values >=
163 1000 will cause the autosuspend timer expiration to be rounded
164 up to the nearest second.
165
166 Not all drivers support this attribute. If it isn't supported,
167 attempts to read or write it will yield I/O errors.
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index 2875f1f74a07..194ca446ac28 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -99,9 +99,38 @@ Description:
99 99
100 dmesg -s 1000000 | grep 'hash matches' 100 dmesg -s 1000000 | grep 'hash matches'
101 101
102 If you do not get any matches (or they appear to be false
103 positives), it is possible that the last PM event point
104 referred to a device created by a loadable kernel module. In
105 this case cat /sys/power/pm_trace_dev_match (see below) after
106 your system is started up and the kernel modules are loaded.
107
102 CAUTION: Using it will cause your machine's real-time (CMOS) 108 CAUTION: Using it will cause your machine's real-time (CMOS)
103 clock to be set to a random invalid time after a resume. 109 clock to be set to a random invalid time after a resume.
104 110
111What; /sys/power/pm_trace_dev_match
112Date: October 2010
113Contact: James Hogan <james@albanarts.com>
114Description:
115 The /sys/power/pm_trace_dev_match file contains the name of the
116 device associated with the last PM event point saved in the RTC
117 across reboots when pm_trace has been used. More precisely it
118 contains the list of current devices (including those
119 registered by loadable kernel modules since boot) which match
120 the device hash in the RTC at boot, with a newline after each
121 one.
122
123 The advantage of this file over the hash matches printed to the
124 kernel log (see /sys/power/pm_trace), is that it includes
125 devices created after boot by loadable kernel modules.
126
127 Due to the small hash size necessary to fit in the RTC, it is
128 possible that more than one device matches the hash, in which
129 case further investigation is required to determine which
130 device is causing the problem. Note that genuine RTC clock
131 values (such as when pm_trace has not been used), can still
132 match a device and output it's name here.
133
105What: /sys/power/pm_async 134What: /sys/power/pm_async
106Date: January 2009 135Date: January 2009
107Contact: Rafael J. Wysocki <rjw@sisk.pl> 136Contact: Rafael J. Wysocki <rjw@sisk.pl>
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 3a0009e03d14..02f21d9220ce 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2170,6 +2170,11 @@ and is between 256 and 4096 characters. It is defined in the file
2170 in <PAGE_SIZE> units (needed only for swap files). 2170 in <PAGE_SIZE> units (needed only for swap files).
2171 See Documentation/power/swsusp-and-swap-files.txt 2171 See Documentation/power/swsusp-and-swap-files.txt
2172 2172
2173 hibernate= [HIBERNATION]
2174 noresume Don't check if there's a hibernation image
2175 present during boot.
2176 nocompress Don't compress/decompress hibernation images.
2177
2173 retain_initrd [RAM] Keep initrd memory after extraction 2178 retain_initrd [RAM] Keep initrd memory after extraction
2174 2179
2175 rhash_entries= [KNL,NET] 2180 rhash_entries= [KNL,NET]
diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
index fb742c213c9e..45e9d4a91284 100644
--- a/Documentation/power/00-INDEX
+++ b/Documentation/power/00-INDEX
@@ -14,6 +14,8 @@ interface.txt
14 - Power management user interface in /sys/power 14 - Power management user interface in /sys/power
15notifiers.txt 15notifiers.txt
16 - Registering suspend notifiers in device drivers 16 - Registering suspend notifiers in device drivers
17opp.txt
18 - Operating Performance Point library
17pci.txt 19pci.txt
18 - How the PCI Subsystem Does Power Management 20 - How the PCI Subsystem Does Power Management
19pm_qos_interface.txt 21pm_qos_interface.txt
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
index e67211fe0ee2..c537834af005 100644
--- a/Documentation/power/interface.txt
+++ b/Documentation/power/interface.txt
@@ -57,7 +57,7 @@ smallest image possible. In particular, if "0" is written to this file, the
57suspend image will be as small as possible. 57suspend image will be as small as possible.
58 58
59Reading from this file will display the current image size limit, which 59Reading from this file will display the current image size limit, which
60is set to 500 MB by default. 60is set to 2/5 of available RAM by default.
61 61
62/sys/power/pm_trace controls the code which saves the last PM event point in 62/sys/power/pm_trace controls the code which saves the last PM event point in
63the RTC across reboots, so that you can debug a machine that just hangs 63the RTC across reboots, so that you can debug a machine that just hangs
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
new file mode 100644
index 000000000000..44d87ad3cea9
--- /dev/null
+++ b/Documentation/power/opp.txt
@@ -0,0 +1,375 @@
1*=============*
2* OPP Library *
3*=============*
4
5(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
6
7Contents
8--------
91. Introduction
102. Initial OPP List Registration
113. OPP Search Functions
124. OPP Availability Control Functions
135. OPP Data Retrieval Functions
146. Cpufreq Table Generation
157. Data Structures
16
171. Introduction
18===============
19Complex SoCs of today consists of a multiple sub-modules working in conjunction.
20In an operational system executing varied use cases, not all modules in the SoC
21need to function at their highest performing frequency all the time. To
22facilitate this, sub-modules in a SoC are grouped into domains, allowing some
23domains to run at lower voltage and frequency while other domains are loaded
24more. The set of discrete tuples consisting of frequency and voltage pairs that
25the device will support per domain are called Operating Performance Points or
26OPPs.
27
28OPP library provides a set of helper functions to organize and query the OPP
29information. The library is located in drivers/base/power/opp.c and the header
30is located in include/linux/opp.h. OPP library can be enabled by enabling
31CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
32CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
33optionally boot at a certain OPP without needing cpufreq.
34
35Typical usage of the OPP library is as follows:
36(users) -> registers a set of default OPPs -> (library)
37SoC framework -> modifies on required cases certain OPPs -> OPP layer
38 -> queries to search/retrieve information ->
39
40OPP layer expects each domain to be represented by a unique device pointer. SoC
41framework registers a set of initial OPPs per device with the OPP layer. This
42list is expected to be an optimally small number typically around 5 per device.
43This initial list contains a set of OPPs that the framework expects to be safely
44enabled by default in the system.
45
46Note on OPP Availability:
47------------------------
48As the system proceeds to operate, SoC framework may choose to make certain
49OPPs available or not available on each device based on various external
50factors. Example usage: Thermal management or other exceptional situations where
51SoC framework might choose to disable a higher frequency OPP to safely continue
52operations until that OPP could be re-enabled if possible.
53
54OPP library facilitates this concept in it's implementation. The following
55operational functions operate only on available opps:
56opp_find_freq_{ceil, floor}, opp_get_voltage, opp_get_freq, opp_get_opp_count
57and opp_init_cpufreq_table
58
59opp_find_freq_exact is meant to be used to find the opp pointer which can then
60be used for opp_enable/disable functions to make an opp available as required.
61
62WARNING: Users of OPP library should refresh their availability count using
63get_opp_count if opp_enable/disable functions are invoked for a device, the
64exact mechanism to trigger these or the notification mechanism to other
65dependent subsystems such as cpufreq are left to the discretion of the SoC
66specific framework which uses the OPP library. Similar care needs to be taken
67care to refresh the cpufreq table in cases of these operations.
68
69WARNING on OPP List locking mechanism:
70-------------------------------------------------
71OPP library uses RCU for exclusivity. RCU allows the query functions to operate
72in multiple contexts and this synchronization mechanism is optimal for a read
73intensive operations on data structure as the OPP library caters to.
74
75To ensure that the data retrieved are sane, the users such as SoC framework
76should ensure that the section of code operating on OPP queries are locked
77using RCU read locks. The opp_find_freq_{exact,ceil,floor},
78opp_get_{voltage, freq, opp_count} fall into this category.
79
80opp_{add,enable,disable} are updaters which use mutex and implement it's own
81RCU locking mechanisms. opp_init_cpufreq_table acts as an updater and uses
82mutex to implment RCU updater strategy. These functions should *NOT* be called
83under RCU locks and other contexts that prevent blocking functions in RCU or
84mutex operations from working.
85
862. Initial OPP List Registration
87================================
88The SoC implementation calls opp_add function iteratively to add OPPs per
89device. It is expected that the SoC framework will register the OPP entries
90optimally- typical numbers range to be less than 5. The list generated by
91registering the OPPs is maintained by OPP library throughout the device
92operation. The SoC framework can subsequently control the availability of the
93OPPs dynamically using the opp_enable / disable functions.
94
95opp_add - Add a new OPP for a specific domain represented by the device pointer.
96 The OPP is defined using the frequency and voltage. Once added, the OPP
97 is assumed to be available and control of it's availability can be done
98 with the opp_enable/disable functions. OPP library internally stores
99 and manages this information in the opp struct. This function may be
100 used by SoC framework to define a optimal list as per the demands of
101 SoC usage environment.
102
103 WARNING: Do not use this function in interrupt context.
104
105 Example:
106 soc_pm_init()
107 {
108 /* Do things */
109 r = opp_add(mpu_dev, 1000000, 900000);
110 if (!r) {
111 pr_err("%s: unable to register mpu opp(%d)\n", r);
112 goto no_cpufreq;
113 }
114 /* Do cpufreq things */
115 no_cpufreq:
116 /* Do remaining things */
117 }
118
1193. OPP Search Functions
120=======================
121High level framework such as cpufreq operates on frequencies. To map the
122frequency back to the corresponding OPP, OPP library provides handy functions
123to search the OPP list that OPP library internally manages. These search
124functions return the matching pointer representing the opp if a match is
125found, else returns error. These errors are expected to be handled by standard
126error checks such as IS_ERR() and appropriate actions taken by the caller.
127
128opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
129 availability. This function is especially useful to enable an OPP which
130 is not available by default.
131 Example: In a case when SoC framework detects a situation where a
132 higher frequency could be made available, it can use this function to
133 find the OPP prior to call the opp_enable to actually make it available.
134 rcu_read_lock();
135 opp = opp_find_freq_exact(dev, 1000000000, false);
136 rcu_read_unlock();
137 /* dont operate on the pointer.. just do a sanity check.. */
138 if (IS_ERR(opp)) {
139 pr_err("frequency not disabled!\n");
140 /* trigger appropriate actions.. */
141 } else {
142 opp_enable(dev,1000000000);
143 }
144
145 NOTE: This is the only search function that operates on OPPs which are
146 not available.
147
148opp_find_freq_floor - Search for an available OPP which is *at most* the
149 provided frequency. This function is useful while searching for a lesser
150 match OR operating on OPP information in the order of decreasing
151 frequency.
152 Example: To find the highest opp for a device:
153 freq = ULONG_MAX;
154 rcu_read_lock();
155 opp_find_freq_floor(dev, &freq);
156 rcu_read_unlock();
157
158opp_find_freq_ceil - Search for an available OPP which is *at least* the
159 provided frequency. This function is useful while searching for a
160 higher match OR operating on OPP information in the order of increasing
161 frequency.
162 Example 1: To find the lowest opp for a device:
163 freq = 0;
164 rcu_read_lock();
165 opp_find_freq_ceil(dev, &freq);
166 rcu_read_unlock();
167 Example 2: A simplified implementation of a SoC cpufreq_driver->target:
168 soc_cpufreq_target(..)
169 {
170 /* Do stuff like policy checks etc. */
171 /* Find the best frequency match for the req */
172 rcu_read_lock();
173 opp = opp_find_freq_ceil(dev, &freq);
174 rcu_read_unlock();
175 if (!IS_ERR(opp))
176 soc_switch_to_freq_voltage(freq);
177 else
178 /* do something when we cant satisfy the req */
179 /* do other stuff */
180 }
181
1824. OPP Availability Control Functions
183=====================================
184A default OPP list registered with the OPP library may not cater to all possible
185situation. The OPP library provides a set of functions to modify the
186availability of a OPP within the OPP list. This allows SoC frameworks to have
187fine grained dynamic control of which sets of OPPs are operationally available.
188These functions are intended to *temporarily* remove an OPP in conditions such
189as thermal considerations (e.g. don't use OPPx until the temperature drops).
190
191WARNING: Do not use these functions in interrupt context.
192
193opp_enable - Make a OPP available for operation.
194 Example: Lets say that 1GHz OPP is to be made available only if the
195 SoC temperature is lower than a certain threshold. The SoC framework
196 implementation might choose to do something as follows:
197 if (cur_temp < temp_low_thresh) {
198 /* Enable 1GHz if it was disabled */
199 rcu_read_lock();
200 opp = opp_find_freq_exact(dev, 1000000000, false);
201 rcu_read_unlock();
202 /* just error check */
203 if (!IS_ERR(opp))
204 ret = opp_enable(dev, 1000000000);
205 else
206 goto try_something_else;
207 }
208
209opp_disable - Make an OPP to be not available for operation
210 Example: Lets say that 1GHz OPP is to be disabled if the temperature
211 exceeds a threshold value. The SoC framework implementation might
212 choose to do something as follows:
213 if (cur_temp > temp_high_thresh) {
214 /* Disable 1GHz if it was enabled */
215 rcu_read_lock();
216 opp = opp_find_freq_exact(dev, 1000000000, true);
217 rcu_read_unlock();
218 /* just error check */
219 if (!IS_ERR(opp))
220 ret = opp_disable(dev, 1000000000);
221 else
222 goto try_something_else;
223 }
224
2255. OPP Data Retrieval Functions
226===============================
227Since OPP library abstracts away the OPP information, a set of functions to pull
228information from the OPP structure is necessary. Once an OPP pointer is
229retrieved using the search functions, the following functions can be used by SoC
230framework to retrieve the information represented inside the OPP layer.
231
232opp_get_voltage - Retrieve the voltage represented by the opp pointer.
233 Example: At a cpufreq transition to a different frequency, SoC
234 framework requires to set the voltage represented by the OPP using
235 the regulator framework to the Power Management chip providing the
236 voltage.
237 soc_switch_to_freq_voltage(freq)
238 {
239 /* do things */
240 rcu_read_lock();
241 opp = opp_find_freq_ceil(dev, &freq);
242 v = opp_get_voltage(opp);
243 rcu_read_unlock();
244 if (v)
245 regulator_set_voltage(.., v);
246 /* do other things */
247 }
248
249opp_get_freq - Retrieve the freq represented by the opp pointer.
250 Example: Lets say the SoC framework uses a couple of helper functions
251 we could pass opp pointers instead of doing additional parameters to
252 handle quiet a bit of data parameters.
253 soc_cpufreq_target(..)
254 {
255 /* do things.. */
256 max_freq = ULONG_MAX;
257 rcu_read_lock();
258 max_opp = opp_find_freq_floor(dev,&max_freq);
259 requested_opp = opp_find_freq_ceil(dev,&freq);
260 if (!IS_ERR(max_opp) && !IS_ERR(requested_opp))
261 r = soc_test_validity(max_opp, requested_opp);
262 rcu_read_unlock();
263 /* do other things */
264 }
265 soc_test_validity(..)
266 {
267 if(opp_get_voltage(max_opp) < opp_get_voltage(requested_opp))
268 return -EINVAL;
269 if(opp_get_freq(max_opp) < opp_get_freq(requested_opp))
270 return -EINVAL;
271 /* do things.. */
272 }
273
274opp_get_opp_count - Retrieve the number of available opps for a device
275 Example: Lets say a co-processor in the SoC needs to know the available
276 frequencies in a table, the main processor can notify as following:
277 soc_notify_coproc_available_frequencies()
278 {
279 /* Do things */
280 rcu_read_lock();
281 num_available = opp_get_opp_count(dev);
282 speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL);
283 /* populate the table in increasing order */
284 freq = 0;
285 while (!IS_ERR(opp = opp_find_freq_ceil(dev, &freq))) {
286 speeds[i] = freq;
287 freq++;
288 i++;
289 }
290 rcu_read_unlock();
291
292 soc_notify_coproc(AVAILABLE_FREQs, speeds, num_available);
293 /* Do other things */
294 }
295
2966. Cpufreq Table Generation
297===========================
298opp_init_cpufreq_table - cpufreq framework typically is initialized with
299 cpufreq_frequency_table_cpuinfo which is provided with the list of
300 frequencies that are available for operation. This function provides
301 a ready to use conversion routine to translate the OPP layer's internal
302 information about the available frequencies into a format readily
303 providable to cpufreq.
304
305 WARNING: Do not use this function in interrupt context.
306
307 Example:
308 soc_pm_init()
309 {
310 /* Do things */
311 r = opp_init_cpufreq_table(dev, &freq_table);
312 if (!r)
313 cpufreq_frequency_table_cpuinfo(policy, freq_table);
314 /* Do other things */
315 }
316
317 NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in
318 addition to CONFIG_PM as power management feature is required to
319 dynamically scale voltage and frequency in a system.
320
3217. Data Structures
322==================
323Typically an SoC contains multiple voltage domains which are variable. Each
324domain is represented by a device pointer. The relationship to OPP can be
325represented as follows:
326SoC
327 |- device 1
328 | |- opp 1 (availability, freq, voltage)
329 | |- opp 2 ..
330 ... ...
331 | `- opp n ..
332 |- device 2
333 ...
334 `- device m
335
336OPP library maintains a internal list that the SoC framework populates and
337accessed by various functions as described above. However, the structures
338representing the actual OPPs and domains are internal to the OPP library itself
339to allow for suitable abstraction reusable across systems.
340
341struct opp - The internal data structure of OPP library which is used to
342 represent an OPP. In addition to the freq, voltage, availability
343 information, it also contains internal book keeping information required
344 for the OPP library to operate on. Pointer to this structure is
345 provided back to the users such as SoC framework to be used as a
346 identifier for OPP in the interactions with OPP layer.
347
348 WARNING: The struct opp pointer should not be parsed or modified by the
349 users. The defaults of for an instance is populated by opp_add, but the
350 availability of the OPP can be modified by opp_enable/disable functions.
351
352struct device - This is used to identify a domain to the OPP layer. The
353 nature of the device and it's implementation is left to the user of
354 OPP library such as the SoC framework.
355
356Overall, in a simplistic view, the data structure operations is represented as
357following:
358
359Initialization / modification:
360 +-----+ /- opp_enable
361opp_add --> | opp | <-------
362 | +-----+ \- opp_disable
363 \-------> domain_info(device)
364
365Search functions:
366 /-- opp_find_freq_ceil ---\ +-----+
367domain_info<---- opp_find_freq_exact -----> | opp |
368 \-- opp_find_freq_floor ---/ +-----+
369
370Retrieval functions:
371+-----+ /- opp_get_voltage
372| opp | <---
373+-----+ \- opp_get_freq
374
375domain_info <- opp_get_opp_count
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 55b859b3bc72..489e9bacd165 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -1,6 +1,7 @@
1Run-time Power Management Framework for I/O Devices 1Run-time Power Management Framework for I/O Devices
2 2
3(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. 3(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
4(C) 2010 Alan Stern <stern@rowland.harvard.edu>
4 5
51. Introduction 61. Introduction
6 7
@@ -157,7 +158,8 @@ rules:
157 to execute it, the other callbacks will not be executed for the same device. 158 to execute it, the other callbacks will not be executed for the same device.
158 159
159 * A request to execute ->runtime_resume() will cancel any pending or 160 * A request to execute ->runtime_resume() will cancel any pending or
160 scheduled requests to execute the other callbacks for the same device. 161 scheduled requests to execute the other callbacks for the same device,
162 except for scheduled autosuspends.
161 163
1623. Run-time PM Device Fields 1643. Run-time PM Device Fields
163 165
@@ -165,7 +167,7 @@ The following device run-time PM fields are present in 'struct dev_pm_info', as
165defined in include/linux/pm.h: 167defined in include/linux/pm.h:
166 168
167 struct timer_list suspend_timer; 169 struct timer_list suspend_timer;
168 - timer used for scheduling (delayed) suspend request 170 - timer used for scheduling (delayed) suspend and autosuspend requests
169 171
170 unsigned long timer_expires; 172 unsigned long timer_expires;
171 - timer expiration time, in jiffies (if this is different from zero, the 173 - timer expiration time, in jiffies (if this is different from zero, the
@@ -230,6 +232,28 @@ defined in include/linux/pm.h:
230 interface; it may only be modified with the help of the pm_runtime_allow() 232 interface; it may only be modified with the help of the pm_runtime_allow()
231 and pm_runtime_forbid() helper functions 233 and pm_runtime_forbid() helper functions
232 234
235 unsigned int no_callbacks;
236 - indicates that the device does not use the run-time PM callbacks (see
237 Section 8); it may be modified only by the pm_runtime_no_callbacks()
238 helper function
239
240 unsigned int use_autosuspend;
241 - indicates that the device's driver supports delayed autosuspend (see
242 Section 9); it may be modified only by the
243 pm_runtime{_dont}_use_autosuspend() helper functions
244
245 unsigned int timer_autosuspends;
246 - indicates that the PM core should attempt to carry out an autosuspend
247 when the timer expires rather than a normal suspend
248
249 int autosuspend_delay;
250 - the delay time (in milliseconds) to be used for autosuspend
251
252 unsigned long last_busy;
253 - the time (in jiffies) when the pm_runtime_mark_last_busy() helper
254 function was last called for this device; used in calculating inactivity
255 periods for autosuspend
256
233All of the above fields are members of the 'power' member of 'struct device'. 257All of the above fields are members of the 'power' member of 'struct device'.
234 258
2354. Run-time PM Device Helper Functions 2594. Run-time PM Device Helper Functions
@@ -255,6 +279,12 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
255 error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt 279 error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
256 to suspend the device again in future 280 to suspend the device again in future
257 281
282 int pm_runtime_autosuspend(struct device *dev);
283 - same as pm_runtime_suspend() except that the autosuspend delay is taken
284 into account; if pm_runtime_autosuspend_expiration() says the delay has
285 not yet expired then an autosuspend is scheduled for the appropriate time
286 and 0 is returned
287
258 int pm_runtime_resume(struct device *dev); 288 int pm_runtime_resume(struct device *dev);
259 - execute the subsystem-level resume callback for the device; returns 0 on 289 - execute the subsystem-level resume callback for the device; returns 0 on
260 success, 1 if the device's run-time PM status was already 'active' or 290 success, 1 if the device's run-time PM status was already 'active' or
@@ -267,6 +297,11 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
267 device (the request is represented by a work item in pm_wq); returns 0 on 297 device (the request is represented by a work item in pm_wq); returns 0 on
268 success or error code if the request has not been queued up 298 success or error code if the request has not been queued up
269 299
300 int pm_request_autosuspend(struct device *dev);
301 - schedule the execution of the subsystem-level suspend callback for the
302 device when the autosuspend delay has expired; if the delay has already
303 expired then the work item is queued up immediately
304
270 int pm_schedule_suspend(struct device *dev, unsigned int delay); 305 int pm_schedule_suspend(struct device *dev, unsigned int delay);
271 - schedule the execution of the subsystem-level suspend callback for the 306 - schedule the execution of the subsystem-level suspend callback for the
272 device in future, where 'delay' is the time to wait before queuing up a 307 device in future, where 'delay' is the time to wait before queuing up a
@@ -298,12 +333,20 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
298 - decrement the device's usage counter 333 - decrement the device's usage counter
299 334
300 int pm_runtime_put(struct device *dev); 335 int pm_runtime_put(struct device *dev);
301 - decrement the device's usage counter, run pm_request_idle(dev) and return 336 - decrement the device's usage counter; if the result is 0 then run
302 its result 337 pm_request_idle(dev) and return its result
338
339 int pm_runtime_put_autosuspend(struct device *dev);
340 - decrement the device's usage counter; if the result is 0 then run
341 pm_request_autosuspend(dev) and return its result
303 342
304 int pm_runtime_put_sync(struct device *dev); 343 int pm_runtime_put_sync(struct device *dev);
305 - decrement the device's usage counter, run pm_runtime_idle(dev) and return 344 - decrement the device's usage counter; if the result is 0 then run
306 its result 345 pm_runtime_idle(dev) and return its result
346
347 int pm_runtime_put_sync_autosuspend(struct device *dev);
348 - decrement the device's usage counter; if the result is 0 then run
349 pm_runtime_autosuspend(dev) and return its result
307 350
308 void pm_runtime_enable(struct device *dev); 351 void pm_runtime_enable(struct device *dev);
309 - enable the run-time PM helper functions to run the device bus type's 352 - enable the run-time PM helper functions to run the device bus type's
@@ -349,19 +392,51 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
349 counter (used by the /sys/devices/.../power/control interface to 392 counter (used by the /sys/devices/.../power/control interface to
350 effectively prevent the device from being power managed at run time) 393 effectively prevent the device from being power managed at run time)
351 394
395 void pm_runtime_no_callbacks(struct device *dev);
396 - set the power.no_callbacks flag for the device and remove the run-time
397 PM attributes from /sys/devices/.../power (or prevent them from being
398 added when the device is registered)
399
400 void pm_runtime_mark_last_busy(struct device *dev);
401 - set the power.last_busy field to the current time
402
403 void pm_runtime_use_autosuspend(struct device *dev);
404 - set the power.use_autosuspend flag, enabling autosuspend delays
405
406 void pm_runtime_dont_use_autosuspend(struct device *dev);
407 - clear the power.use_autosuspend flag, disabling autosuspend delays
408
409 void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
410 - set the power.autosuspend_delay value to 'delay' (expressed in
411 milliseconds); if 'delay' is negative then run-time suspends are
412 prevented
413
414 unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
415 - calculate the time when the current autosuspend delay period will expire,
416 based on power.last_busy and power.autosuspend_delay; if the delay time
417 is 1000 ms or larger then the expiration time is rounded up to the
418 nearest second; returns 0 if the delay period has already expired or
419 power.use_autosuspend isn't set, otherwise returns the expiration time
420 in jiffies
421
352It is safe to execute the following helper functions from interrupt context: 422It is safe to execute the following helper functions from interrupt context:
353 423
354pm_request_idle() 424pm_request_idle()
425pm_request_autosuspend()
355pm_schedule_suspend() 426pm_schedule_suspend()
356pm_request_resume() 427pm_request_resume()
357pm_runtime_get_noresume() 428pm_runtime_get_noresume()
358pm_runtime_get() 429pm_runtime_get()
359pm_runtime_put_noidle() 430pm_runtime_put_noidle()
360pm_runtime_put() 431pm_runtime_put()
432pm_runtime_put_autosuspend()
433pm_runtime_enable()
361pm_suspend_ignore_children() 434pm_suspend_ignore_children()
362pm_runtime_set_active() 435pm_runtime_set_active()
363pm_runtime_set_suspended() 436pm_runtime_set_suspended()
364pm_runtime_enable() 437pm_runtime_suspended()
438pm_runtime_mark_last_busy()
439pm_runtime_autosuspend_expiration()
365 440
3665. Run-time PM Initialization, Device Probing and Removal 4415. Run-time PM Initialization, Device Probing and Removal
367 442
@@ -524,3 +599,141 @@ poweroff and run-time suspend callback, and similarly for system resume, thaw,
524restore, and run-time resume, can achieve this with the help of the 599restore, and run-time resume, can achieve this with the help of the
525UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its 600UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
526last argument to NULL). 601last argument to NULL).
602
6038. "No-Callback" Devices
604
605Some "devices" are only logical sub-devices of their parent and cannot be
606power-managed on their own. (The prototype example is a USB interface. Entire
607USB devices can go into low-power mode or send wake-up requests, but neither is
608possible for individual interfaces.) The drivers for these devices have no
609need of run-time PM callbacks; if the callbacks did exist, ->runtime_suspend()
610and ->runtime_resume() would always return 0 without doing anything else and
611->runtime_idle() would always call pm_runtime_suspend().
612
613Subsystems can tell the PM core about these devices by calling
614pm_runtime_no_callbacks(). This should be done after the device structure is
615initialized and before it is registered (although after device registration is
616also okay). The routine will set the device's power.no_callbacks flag and
617prevent the non-debugging run-time PM sysfs attributes from being created.
618
619When power.no_callbacks is set, the PM core will not invoke the
620->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks.
621Instead it will assume that suspends and resumes always succeed and that idle
622devices should be suspended.
623
624As a consequence, the PM core will never directly inform the device's subsystem
625or driver about run-time power changes. Instead, the driver for the device's
626parent must take responsibility for telling the device's driver when the
627parent's power state changes.
628
6299. Autosuspend, or automatically-delayed suspends
630
631Changing a device's power state isn't free; it requires both time and energy.
632A device should be put in a low-power state only when there's some reason to
633think it will remain in that state for a substantial time. A common heuristic
634says that a device which hasn't been used for a while is liable to remain
635unused; following this advice, drivers should not allow devices to be suspended
636at run-time until they have been inactive for some minimum period. Even when
637the heuristic ends up being non-optimal, it will still prevent devices from
638"bouncing" too rapidly between low-power and full-power states.
639
640The term "autosuspend" is an historical remnant. It doesn't mean that the
641device is automatically suspended (the subsystem or driver still has to call
642the appropriate PM routines); rather it means that run-time suspends will
643automatically be delayed until the desired period of inactivity has elapsed.
644
645Inactivity is determined based on the power.last_busy field. Drivers should
646call pm_runtime_mark_last_busy() to update this field after carrying out I/O,
647typically just before calling pm_runtime_put_autosuspend(). The desired length
648of the inactivity period is a matter of policy. Subsystems can set this length
649initially by calling pm_runtime_set_autosuspend_delay(), but after device
650registration the length should be controlled by user space, using the
651/sys/devices/.../power/autosuspend_delay_ms attribute.
652
653In order to use autosuspend, subsystems or drivers must call
654pm_runtime_use_autosuspend() (preferably before registering the device), and
655thereafter they should use the various *_autosuspend() helper functions instead
656of the non-autosuspend counterparts:
657
658 Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
659 Instead of: pm_schedule_suspend use: pm_request_autosuspend;
660 Instead of: pm_runtime_put use: pm_runtime_put_autosuspend;
661 Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend.
662
663Drivers may also continue to use the non-autosuspend helper functions; they
664will behave normally, not taking the autosuspend delay into account.
665Similarly, if the power.use_autosuspend field isn't set then the autosuspend
666helper functions will behave just like the non-autosuspend counterparts.
667
668The implementation is well suited for asynchronous use in interrupt contexts.
669However such use inevitably involves races, because the PM core can't
670synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
671This synchronization must be handled by the driver, using its private lock.
672Here is a schematic pseudo-code example:
673
674 foo_read_or_write(struct foo_priv *foo, void *data)
675 {
676 lock(&foo->private_lock);
677 add_request_to_io_queue(foo, data);
678 if (foo->num_pending_requests++ == 0)
679 pm_runtime_get(&foo->dev);
680 if (!foo->is_suspended)
681 foo_process_next_request(foo);
682 unlock(&foo->private_lock);
683 }
684
685 foo_io_completion(struct foo_priv *foo, void *req)
686 {
687 lock(&foo->private_lock);
688 if (--foo->num_pending_requests == 0) {
689 pm_runtime_mark_last_busy(&foo->dev);
690 pm_runtime_put_autosuspend(&foo->dev);
691 } else {
692 foo_process_next_request(foo);
693 }
694 unlock(&foo->private_lock);
695 /* Send req result back to the user ... */
696 }
697
698 int foo_runtime_suspend(struct device *dev)
699 {
700 struct foo_priv foo = container_of(dev, ...);
701 int ret = 0;
702
703 lock(&foo->private_lock);
704 if (foo->num_pending_requests > 0) {
705 ret = -EBUSY;
706 } else {
707 /* ... suspend the device ... */
708 foo->is_suspended = 1;
709 }
710 unlock(&foo->private_lock);
711 return ret;
712 }
713
714 int foo_runtime_resume(struct device *dev)
715 {
716 struct foo_priv foo = container_of(dev, ...);
717
718 lock(&foo->private_lock);
719 /* ... resume the device ... */
720 foo->is_suspended = 0;
721 pm_runtime_mark_last_busy(&foo->dev);
722 if (foo->num_pending_requests > 0)
723 foo_process_requests(foo);
724 unlock(&foo->private_lock);
725 return 0;
726 }
727
728The important point is that after foo_io_completion() asks for an autosuspend,
729the foo_runtime_suspend() callback may race with foo_read_or_write().
730Therefore foo_runtime_suspend() has to check whether there are any pending I/O
731requests (while holding the private lock) before allowing the suspend to
732proceed.
733
734In addition, the power.autosuspend_delay field can be changed by user space at
735any time. If a driver cares about this, it can call
736pm_runtime_autosuspend_expiration() from within the ->runtime_suspend()
737callback while holding its private lock. If the function returns a nonzero
738value then the delay has not yet expired and the callback should return
739-EAGAIN.
diff --git a/Documentation/power/s2ram.txt b/Documentation/power/s2ram.txt
index 514b94fc931e..1bdfa0443773 100644
--- a/Documentation/power/s2ram.txt
+++ b/Documentation/power/s2ram.txt
@@ -49,6 +49,13 @@ machine that doesn't boot) is:
49 device (lspci and /sys/devices/pci* is your friend), and see if you can 49 device (lspci and /sys/devices/pci* is your friend), and see if you can
50 fix it, disable it, or trace into its resume function. 50 fix it, disable it, or trace into its resume function.
51 51
52 If no device matches the hash (or any matches appear to be false positives),
53 the culprit may be a device from a loadable kernel module that is not loaded
54 until after the hash is checked. You can check the hash against the current
55 devices again after more modules are loaded using sysfs:
56
57 cat /sys/power/pm_trace_dev_match
58
52For example, the above happens to be the VGA device on my EVO, which I 59For example, the above happens to be the VGA device on my EVO, which I
53used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out 60used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out
54that "radeonfb" simply cannot resume that device - it tries to set the 61that "radeonfb" simply cannot resume that device - it tries to set the
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index 9d60ab717a7b..ea718891a665 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -66,7 +66,8 @@ swsusp saves the state of the machine into active swaps and then reboots or
66powerdowns. You must explicitly specify the swap partition to resume from with 66powerdowns. You must explicitly specify the swap partition to resume from with
67``resume='' kernel option. If signature is found it loads and restores saved 67``resume='' kernel option. If signature is found it loads and restores saved
68state. If the option ``noresume'' is specified as a boot parameter, it skips 68state. If the option ``noresume'' is specified as a boot parameter, it skips
69the resuming. 69the resuming. If the option ``hibernate=nocompress'' is specified as a boot
70parameter, it saves hibernation image without compression.
70 71
71In the meantime while the system is suspended you should not add/remove any 72In the meantime while the system is suspended you should not add/remove any
72of the hardware, write to the filesystems, etc. 73of the hardware, write to the filesystems, etc.