aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/power
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/power')
-rw-r--r--Documentation/power/00-INDEX2
-rw-r--r--Documentation/power/devices.txt119
-rw-r--r--Documentation/power/drivers-testing.txt8
-rw-r--r--Documentation/power/interface.txt2
-rw-r--r--Documentation/power/notifiers.txt53
-rw-r--r--Documentation/power/opp.txt378
-rw-r--r--Documentation/power/regulator/machine.txt4
-rw-r--r--Documentation/power/runtime_pm.txt306
-rw-r--r--Documentation/power/s2ram.txt7
-rw-r--r--Documentation/power/states.txt12
-rw-r--r--Documentation/power/swsusp.txt7
-rw-r--r--Documentation/power/userland-swsusp.txt6
12 files changed, 773 insertions, 131 deletions
diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
index fb742c213c9e..45e9d4a91284 100644
--- a/Documentation/power/00-INDEX
+++ b/Documentation/power/00-INDEX
@@ -14,6 +14,8 @@ interface.txt
14 - Power management user interface in /sys/power 14 - Power management user interface in /sys/power
15notifiers.txt 15notifiers.txt
16 - Registering suspend notifiers in device drivers 16 - Registering suspend notifiers in device drivers
17opp.txt
18 - Operating Performance Point library
17pci.txt 19pci.txt
18 - How the PCI Subsystem Does Power Management 20 - How the PCI Subsystem Does Power Management
19pm_qos_interface.txt 21pm_qos_interface.txt
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 57080cd74575..64565aac6e40 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -1,6 +1,6 @@
1Device Power Management 1Device Power Management
2 2
3Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. 3Copyright (c) 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
4Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu> 4Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu>
5 5
6 6
@@ -159,18 +159,18 @@ matter, and the kernel is responsible for keeping track of it. By contrast,
159whether or not a wakeup-capable device should issue wakeup events is a policy 159whether or not a wakeup-capable device should issue wakeup events is a policy
160decision, and it is managed by user space through a sysfs attribute: the 160decision, and it is managed by user space through a sysfs attribute: the
161power/wakeup file. User space can write the strings "enabled" or "disabled" to 161power/wakeup file. User space can write the strings "enabled" or "disabled" to
162set or clear the should_wakeup flag, respectively. Reads from the file will 162set or clear the "should_wakeup" flag, respectively. This file is only present
163return the corresponding string if can_wakeup is true, but if can_wakeup is 163for wakeup-capable devices (i.e. devices whose "can_wakeup" flags are set)
164false then reads will return an empty string, to indicate that the device 164and is created (or removed) by device_set_wakeup_capable(). Reads from the
165doesn't support wakeup events. (But even though the file appears empty, writes 165file will return the corresponding string.
166will still affect the should_wakeup flag.)
167 166
168The device_may_wakeup() routine returns true only if both flags are set. 167The device_may_wakeup() routine returns true only if both flags are set.
169Drivers should check this routine when putting devices in a low-power state 168This information is used by subsystems, like the PCI bus type code, to see
170during a system sleep transition, to see whether or not to enable the devices' 169whether or not to enable the devices' wakeup mechanisms. If device wakeup
171wakeup mechanisms. However for runtime power management, wakeup events should 170mechanisms are enabled or disabled directly by drivers, they also should use
172be enabled whenever the device and driver both support them, regardless of the 171device_may_wakeup() to decide what to do during a system sleep transition.
173should_wakeup flag. 172However for runtime power management, wakeup events should be enabled whenever
173the device and driver both support them, regardless of the should_wakeup flag.
174 174
175 175
176/sys/devices/.../power/control files 176/sys/devices/.../power/control files
@@ -249,23 +249,18 @@ various phases always run after tasks have been frozen and before they are
249unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have 249unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have
250been disabled (except for those marked with the IRQ_WAKEUP flag). 250been disabled (except for those marked with the IRQ_WAKEUP flag).
251 251
252Most phases use bus, type, and class callbacks (that is, methods defined in 252All phases use bus, type, or class callbacks (that is, methods defined in
253dev->bus->pm, dev->type->pm, and dev->class->pm). The prepare and complete 253dev->bus->pm, dev->type->pm, or dev->class->pm). These callbacks are mutually
254phases are exceptions; they use only bus callbacks. When multiple callbacks 254exclusive, so if the device type provides a struct dev_pm_ops object pointed to
255are used in a phase, they are invoked in the order: <class, type, bus> during 255by its pm field (i.e. both dev->type and dev->type->pm are defined), the
256power-down transitions and in the opposite order during power-up transitions. 256callbacks included in that object (i.e. dev->type->pm) will be used. Otherwise,
257For example, during the suspend phase the PM core invokes 257if the class provides a struct dev_pm_ops object pointed to by its pm field
258 258(i.e. both dev->class and dev->class->pm are defined), the PM core will use the
259 dev->class->pm.suspend(dev); 259callbacks from that object (i.e. dev->class->pm). Finally, if the pm fields of
260 dev->type->pm.suspend(dev); 260both the device type and class objects are NULL (or those objects do not exist),
261 dev->bus->pm.suspend(dev); 261the callbacks provided by the bus (that is, the callbacks from dev->bus->pm)
262 262will be used (this allows device types to override callbacks provided by bus
263before moving on to the next device, whereas during the resume phase the core 263types or classes if necessary).
264invokes
265
266 dev->bus->pm.resume(dev);
267 dev->type->pm.resume(dev);
268 dev->class->pm.resume(dev);
269 264
270These callbacks may in turn invoke device- or driver-specific methods stored in 265These callbacks may in turn invoke device- or driver-specific methods stored in
271dev->driver->pm, but they don't have to. 266dev->driver->pm, but they don't have to.
@@ -284,11 +279,15 @@ When the system goes into the standby or memory sleep state, the phases are:
284 time.) Unlike the other suspend-related phases, during the prepare 279 time.) Unlike the other suspend-related phases, during the prepare
285 phase the device tree is traversed top-down. 280 phase the device tree is traversed top-down.
286 281
287 The prepare phase uses only a bus callback. After the callback method 282 In addition to that, if device drivers need to allocate additional
288 returns, no new children may be registered below the device. The method 283 memory to be able to hadle device suspend correctly, that should be
289 may also prepare the device or driver in some way for the upcoming 284 done in the prepare phase.
290 system power transition, but it should not put the device into a 285
291 low-power state. 286 After the prepare callback method returns, no new children may be
287 registered below the device. The method may also prepare the device or
288 driver in some way for the upcoming system power transition (for
289 example, by allocating additional memory required for this purpose), but
290 it should not put the device into a low-power state.
292 291
293 2. The suspend methods should quiesce the device to stop it from performing 292 2. The suspend methods should quiesce the device to stop it from performing
294 I/O. They also may save the device registers and put it into the 293 I/O. They also may save the device registers and put it into the
@@ -372,7 +371,7 @@ Drivers need to be able to handle hardware which has been reset since the
372suspend methods were called, for example by complete reinitialization. 371suspend methods were called, for example by complete reinitialization.
373This may be the hardest part, and the one most protected by NDA'd documents 372This may be the hardest part, and the one most protected by NDA'd documents
374and chip errata. It's simplest if the hardware state hasn't changed since 373and chip errata. It's simplest if the hardware state hasn't changed since
375the suspend was carried out, but that can't be guaranteed (in fact, it ususally 374the suspend was carried out, but that can't be guaranteed (in fact, it usually
376is not the case). 375is not the case).
377 376
378Drivers must also be prepared to notice that the device has been removed 377Drivers must also be prepared to notice that the device has been removed
@@ -507,30 +506,34 @@ routines. Nevertheless, different callback pointers are used in case there is a
507situation where it actually matters. 506situation where it actually matters.
508 507
509 508
510System Devices 509Device Power Domains
511-------------- 510--------------------
512System devices (sysdevs) follow a slightly different API, which can be found in 511Sometimes devices share reference clocks or other power resources. In those
513 512cases it generally is not possible to put devices into low-power states
514 include/linux/sysdev.h 513individually. Instead, a set of devices sharing a power resource can be put
515 drivers/base/sys.c 514into a low-power state together at the same time by turning off the shared
516 515power resource. Of course, they also need to be put into the full-power state
517System devices will be suspended with interrupts disabled, and after all other 516together, by turning the shared power resource on. A set of devices with this
518devices have been suspended. On resume, they will be resumed before any other 517property is often referred to as a power domain.
519devices, and also with interrupts disabled. These things occur in special 518
520"sysdev_driver" phases, which affect only system devices. 519Support for power domains is provided through the pwr_domain field of struct
521 520device. This field is a pointer to an object of type struct dev_power_domain,
522Thus, after the suspend_noirq (or freeze_noirq or poweroff_noirq) phase, when 521defined in include/linux/pm.h, providing a set of power management callbacks
523the non-boot CPUs are all offline and IRQs are disabled on the remaining online 522analogous to the subsystem-level and device driver callbacks that are executed
524CPU, then a sysdev_driver.suspend phase is carried out, and the system enters a 523for the given device during all power transitions, instead of the respective
525sleep state (or a system image is created). During resume (or after the image 524subsystem-level callbacks. Specifically, if a device's pm_domain pointer is
526has been created or loaded) a sysdev_driver.resume phase is carried out, IRQs 525not NULL, the ->suspend() callback from the object pointed to by it will be
527are enabled on the only online CPU, the non-boot CPUs are enabled, and the 526executed instead of its subsystem's (e.g. bus type's) ->suspend() callback and
528resume_noirq (or thaw_noirq or restore_noirq) phase begins. 527anlogously for all of the remaining callbacks. In other words, power management
529 528domain callbacks, if defined for the given device, always take precedence over
530Code to actually enter and exit the system-wide low power state sometimes 529the callbacks provided by the device's subsystem (e.g. bus type).
531involves hardware details that are only known to the boot firmware, and 530
532may leave a CPU running software (from SRAM or flash memory) that monitors 531The support for device power management domains is only relevant to platforms
533the system and manages its wakeup sequence. 532needing to use the same device driver power management callbacks in many
533different power domain configurations and wanting to avoid incorporating the
534support for power domains into subsystem-level callbacks, for example by
535modifying the platform bus type. Other platforms need not implement it or take
536it into account in any way.
534 537
535 538
536Device Low Power (suspend) States 539Device Low Power (suspend) States
diff --git a/Documentation/power/drivers-testing.txt b/Documentation/power/drivers-testing.txt
index 7f7a737f7f9f..638afdf4d6b8 100644
--- a/Documentation/power/drivers-testing.txt
+++ b/Documentation/power/drivers-testing.txt
@@ -23,10 +23,10 @@ Once you have resolved the suspend/resume-related problems with your test system
23without the new driver, you are ready to test it: 23without the new driver, you are ready to test it:
24 24
25a) Build the driver as a module, load it and try the test modes of hibernation 25a) Build the driver as a module, load it and try the test modes of hibernation
26 (see: Documents/power/basic-pm-debugging.txt, 1). 26 (see: Documentation/power/basic-pm-debugging.txt, 1).
27 27
28b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and 28b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
29 "platform" modes (see: Documents/power/basic-pm-debugging.txt, 1). 29 "platform" modes (see: Documentation/power/basic-pm-debugging.txt, 1).
30 30
31c) Compile the driver directly into the kernel and try the test modes of 31c) Compile the driver directly into the kernel and try the test modes of
32 hibernation. 32 hibernation.
@@ -34,12 +34,12 @@ c) Compile the driver directly into the kernel and try the test modes of
34d) Attempt to hibernate with the driver compiled directly into the kernel 34d) Attempt to hibernate with the driver compiled directly into the kernel
35 in the "reboot", "shutdown" and "platform" modes. 35 in the "reboot", "shutdown" and "platform" modes.
36 36
37e) Try the test modes of suspend (see: Documents/power/basic-pm-debugging.txt, 37e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.txt,
38 2). [As far as the STR tests are concerned, it should not matter whether or 38 2). [As far as the STR tests are concerned, it should not matter whether or
39 not the driver is built as a module.] 39 not the driver is built as a module.]
40 40
41f) Attempt to suspend to RAM using the s2ram tool with the driver loaded 41f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
42 (see: Documents/power/basic-pm-debugging.txt, 2). 42 (see: Documentation/power/basic-pm-debugging.txt, 2).
43 43
44Each of the above tests should be repeated several times and the STD tests 44Each of the above tests should be repeated several times and the STD tests
45should be mixed with the STR tests. If any of them fails, the driver cannot be 45should be mixed with the STR tests. If any of them fails, the driver cannot be
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
index e67211fe0ee2..c537834af005 100644
--- a/Documentation/power/interface.txt
+++ b/Documentation/power/interface.txt
@@ -57,7 +57,7 @@ smallest image possible. In particular, if "0" is written to this file, the
57suspend image will be as small as possible. 57suspend image will be as small as possible.
58 58
59Reading from this file will display the current image size limit, which 59Reading from this file will display the current image size limit, which
60is set to 500 MB by default. 60is set to 2/5 of available RAM by default.
61 61
62/sys/power/pm_trace controls the code which saves the last PM event point in 62/sys/power/pm_trace controls the code which saves the last PM event point in
63the RTC across reboots, so that you can debug a machine that just hangs 63the RTC across reboots, so that you can debug a machine that just hangs
diff --git a/Documentation/power/notifiers.txt b/Documentation/power/notifiers.txt
index ae1b7ec07684..c2a4a346c0d9 100644
--- a/Documentation/power/notifiers.txt
+++ b/Documentation/power/notifiers.txt
@@ -1,46 +1,41 @@
1Suspend notifiers 1Suspend notifiers
2 (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL 2 (C) 2007-2011 Rafael J. Wysocki <rjw@sisk.pl>, GPL
3 3
4There are some operations that device drivers may want to carry out in their 4There are some operations that subsystems or drivers may want to carry out
5.suspend() routines, but shouldn't, because they can cause the hibernation or 5before hibernation/suspend or after restore/resume, but they require the system
6suspend to fail. For example, a driver may want to allocate a substantial amount 6to be fully functional, so the drivers' and subsystems' .suspend() and .resume()
7of memory (like 50 MB) in .suspend(), but that shouldn't be done after the 7or even .prepare() and .complete() callbacks are not suitable for this purpose.
8swsusp's memory shrinker has run. 8For example, device drivers may want to upload firmware to their devices after
9 9resume/restore, but they cannot do it by calling request_firmware() from their
10Also, there may be some operations, that subsystems want to carry out before a 10.resume() or .complete() routines (user land processes are frozen at these
11hibernation/suspend or after a restore/resume, requiring the system to be fully 11points). The solution may be to load the firmware into memory before processes
12functional, so the drivers' .suspend() and .resume() routines are not suitable 12are frozen and upload it from there in the .resume() routine.
13for this purpose. For example, device drivers may want to upload firmware to 13A suspend/hibernation notifier may be used for this purpose.
14their devices after a restore from a hibernation image, but they cannot do it by 14
15calling request_firmware() from their .resume() routines (user land processes 15The subsystems or drivers having such needs can register suspend notifiers that
16are frozen at this point). The solution may be to load the firmware into 16will be called upon the following events by the PM core:
17memory before processes are frozen and upload it from there in the .resume()
18routine. Of course, a hibernation notifier may be used for this purpose.
19
20The subsystems that have such needs can register suspend notifiers that will be
21called upon the following events by the suspend core:
22 17
23PM_HIBERNATION_PREPARE The system is going to hibernate or suspend, tasks will 18PM_HIBERNATION_PREPARE The system is going to hibernate or suspend, tasks will
24 be frozen immediately. 19 be frozen immediately.
25 20
26PM_POST_HIBERNATION The system memory state has been restored from a 21PM_POST_HIBERNATION The system memory state has been restored from a
27 hibernation image or an error occured during the 22 hibernation image or an error occurred during
28 hibernation. Device drivers' .resume() callbacks have 23 hibernation. Device drivers' restore callbacks have
29 been executed and tasks have been thawed. 24 been executed and tasks have been thawed.
30 25
31PM_RESTORE_PREPARE The system is going to restore a hibernation image. 26PM_RESTORE_PREPARE The system is going to restore a hibernation image.
32 If all goes well the restored kernel will issue a 27 If all goes well, the restored kernel will issue a
33 PM_POST_HIBERNATION notification. 28 PM_POST_HIBERNATION notification.
34 29
35PM_POST_RESTORE An error occurred during the hibernation restore. 30PM_POST_RESTORE An error occurred during restore from hibernation.
36 Device drivers' .resume() callbacks have been executed 31 Device drivers' restore callbacks have been executed
37 and tasks have been thawed. 32 and tasks have been thawed.
38 33
39PM_SUSPEND_PREPARE The system is preparing for a suspend. 34PM_SUSPEND_PREPARE The system is preparing for suspend.
40 35
41PM_POST_SUSPEND The system has just resumed or an error occured during 36PM_POST_SUSPEND The system has just resumed or an error occurred during
42 the suspend. Device drivers' .resume() callbacks have 37 suspend. Device drivers' resume callbacks have been
43 been executed and tasks have been thawed. 38 executed and tasks have been thawed.
44 39
45It is generally assumed that whatever the notifiers do for 40It is generally assumed that whatever the notifiers do for
46PM_HIBERNATION_PREPARE, should be undone for PM_POST_HIBERNATION. Analogously, 41PM_HIBERNATION_PREPARE, should be undone for PM_POST_HIBERNATION. Analogously,
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
new file mode 100644
index 000000000000..5ae70a12c1e2
--- /dev/null
+++ b/Documentation/power/opp.txt
@@ -0,0 +1,378 @@
1*=============*
2* OPP Library *
3*=============*
4
5(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
6
7Contents
8--------
91. Introduction
102. Initial OPP List Registration
113. OPP Search Functions
124. OPP Availability Control Functions
135. OPP Data Retrieval Functions
146. Cpufreq Table Generation
157. Data Structures
16
171. Introduction
18===============
19Complex SoCs of today consists of a multiple sub-modules working in conjunction.
20In an operational system executing varied use cases, not all modules in the SoC
21need to function at their highest performing frequency all the time. To
22facilitate this, sub-modules in a SoC are grouped into domains, allowing some
23domains to run at lower voltage and frequency while other domains are loaded
24more. The set of discrete tuples consisting of frequency and voltage pairs that
25the device will support per domain are called Operating Performance Points or
26OPPs.
27
28OPP library provides a set of helper functions to organize and query the OPP
29information. The library is located in drivers/base/power/opp.c and the header
30is located in include/linux/opp.h. OPP library can be enabled by enabling
31CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
32CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
33optionally boot at a certain OPP without needing cpufreq.
34
35Typical usage of the OPP library is as follows:
36(users) -> registers a set of default OPPs -> (library)
37SoC framework -> modifies on required cases certain OPPs -> OPP layer
38 -> queries to search/retrieve information ->
39
40Architectures that provide a SoC framework for OPP should select ARCH_HAS_OPP
41to make the OPP layer available.
42
43OPP layer expects each domain to be represented by a unique device pointer. SoC
44framework registers a set of initial OPPs per device with the OPP layer. This
45list is expected to be an optimally small number typically around 5 per device.
46This initial list contains a set of OPPs that the framework expects to be safely
47enabled by default in the system.
48
49Note on OPP Availability:
50------------------------
51As the system proceeds to operate, SoC framework may choose to make certain
52OPPs available or not available on each device based on various external
53factors. Example usage: Thermal management or other exceptional situations where
54SoC framework might choose to disable a higher frequency OPP to safely continue
55operations until that OPP could be re-enabled if possible.
56
57OPP library facilitates this concept in it's implementation. The following
58operational functions operate only on available opps:
59opp_find_freq_{ceil, floor}, opp_get_voltage, opp_get_freq, opp_get_opp_count
60and opp_init_cpufreq_table
61
62opp_find_freq_exact is meant to be used to find the opp pointer which can then
63be used for opp_enable/disable functions to make an opp available as required.
64
65WARNING: Users of OPP library should refresh their availability count using
66get_opp_count if opp_enable/disable functions are invoked for a device, the
67exact mechanism to trigger these or the notification mechanism to other
68dependent subsystems such as cpufreq are left to the discretion of the SoC
69specific framework which uses the OPP library. Similar care needs to be taken
70care to refresh the cpufreq table in cases of these operations.
71
72WARNING on OPP List locking mechanism:
73-------------------------------------------------
74OPP library uses RCU for exclusivity. RCU allows the query functions to operate
75in multiple contexts and this synchronization mechanism is optimal for a read
76intensive operations on data structure as the OPP library caters to.
77
78To ensure that the data retrieved are sane, the users such as SoC framework
79should ensure that the section of code operating on OPP queries are locked
80using RCU read locks. The opp_find_freq_{exact,ceil,floor},
81opp_get_{voltage, freq, opp_count} fall into this category.
82
83opp_{add,enable,disable} are updaters which use mutex and implement it's own
84RCU locking mechanisms. opp_init_cpufreq_table acts as an updater and uses
85mutex to implment RCU updater strategy. These functions should *NOT* be called
86under RCU locks and other contexts that prevent blocking functions in RCU or
87mutex operations from working.
88
892. Initial OPP List Registration
90================================
91The SoC implementation calls opp_add function iteratively to add OPPs per
92device. It is expected that the SoC framework will register the OPP entries
93optimally- typical numbers range to be less than 5. The list generated by
94registering the OPPs is maintained by OPP library throughout the device
95operation. The SoC framework can subsequently control the availability of the
96OPPs dynamically using the opp_enable / disable functions.
97
98opp_add - Add a new OPP for a specific domain represented by the device pointer.
99 The OPP is defined using the frequency and voltage. Once added, the OPP
100 is assumed to be available and control of it's availability can be done
101 with the opp_enable/disable functions. OPP library internally stores
102 and manages this information in the opp struct. This function may be
103 used by SoC framework to define a optimal list as per the demands of
104 SoC usage environment.
105
106 WARNING: Do not use this function in interrupt context.
107
108 Example:
109 soc_pm_init()
110 {
111 /* Do things */
112 r = opp_add(mpu_dev, 1000000, 900000);
113 if (!r) {
114 pr_err("%s: unable to register mpu opp(%d)\n", r);
115 goto no_cpufreq;
116 }
117 /* Do cpufreq things */
118 no_cpufreq:
119 /* Do remaining things */
120 }
121
1223. OPP Search Functions
123=======================
124High level framework such as cpufreq operates on frequencies. To map the
125frequency back to the corresponding OPP, OPP library provides handy functions
126to search the OPP list that OPP library internally manages. These search
127functions return the matching pointer representing the opp if a match is
128found, else returns error. These errors are expected to be handled by standard
129error checks such as IS_ERR() and appropriate actions taken by the caller.
130
131opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
132 availability. This function is especially useful to enable an OPP which
133 is not available by default.
134 Example: In a case when SoC framework detects a situation where a
135 higher frequency could be made available, it can use this function to
136 find the OPP prior to call the opp_enable to actually make it available.
137 rcu_read_lock();
138 opp = opp_find_freq_exact(dev, 1000000000, false);
139 rcu_read_unlock();
140 /* dont operate on the pointer.. just do a sanity check.. */
141 if (IS_ERR(opp)) {
142 pr_err("frequency not disabled!\n");
143 /* trigger appropriate actions.. */
144 } else {
145 opp_enable(dev,1000000000);
146 }
147
148 NOTE: This is the only search function that operates on OPPs which are
149 not available.
150
151opp_find_freq_floor - Search for an available OPP which is *at most* the
152 provided frequency. This function is useful while searching for a lesser
153 match OR operating on OPP information in the order of decreasing
154 frequency.
155 Example: To find the highest opp for a device:
156 freq = ULONG_MAX;
157 rcu_read_lock();
158 opp_find_freq_floor(dev, &freq);
159 rcu_read_unlock();
160
161opp_find_freq_ceil - Search for an available OPP which is *at least* the
162 provided frequency. This function is useful while searching for a
163 higher match OR operating on OPP information in the order of increasing
164 frequency.
165 Example 1: To find the lowest opp for a device:
166 freq = 0;
167 rcu_read_lock();
168 opp_find_freq_ceil(dev, &freq);
169 rcu_read_unlock();
170 Example 2: A simplified implementation of a SoC cpufreq_driver->target:
171 soc_cpufreq_target(..)
172 {
173 /* Do stuff like policy checks etc. */
174 /* Find the best frequency match for the req */
175 rcu_read_lock();
176 opp = opp_find_freq_ceil(dev, &freq);
177 rcu_read_unlock();
178 if (!IS_ERR(opp))
179 soc_switch_to_freq_voltage(freq);
180 else
181 /* do something when we can't satisfy the req */
182 /* do other stuff */
183 }
184
1854. OPP Availability Control Functions
186=====================================
187A default OPP list registered with the OPP library may not cater to all possible
188situation. The OPP library provides a set of functions to modify the
189availability of a OPP within the OPP list. This allows SoC frameworks to have
190fine grained dynamic control of which sets of OPPs are operationally available.
191These functions are intended to *temporarily* remove an OPP in conditions such
192as thermal considerations (e.g. don't use OPPx until the temperature drops).
193
194WARNING: Do not use these functions in interrupt context.
195
196opp_enable - Make a OPP available for operation.
197 Example: Lets say that 1GHz OPP is to be made available only if the
198 SoC temperature is lower than a certain threshold. The SoC framework
199 implementation might choose to do something as follows:
200 if (cur_temp < temp_low_thresh) {
201 /* Enable 1GHz if it was disabled */
202 rcu_read_lock();
203 opp = opp_find_freq_exact(dev, 1000000000, false);
204 rcu_read_unlock();
205 /* just error check */
206 if (!IS_ERR(opp))
207 ret = opp_enable(dev, 1000000000);
208 else
209 goto try_something_else;
210 }
211
212opp_disable - Make an OPP to be not available for operation
213 Example: Lets say that 1GHz OPP is to be disabled if the temperature
214 exceeds a threshold value. The SoC framework implementation might
215 choose to do something as follows:
216 if (cur_temp > temp_high_thresh) {
217 /* Disable 1GHz if it was enabled */
218 rcu_read_lock();
219 opp = opp_find_freq_exact(dev, 1000000000, true);
220 rcu_read_unlock();
221 /* just error check */
222 if (!IS_ERR(opp))
223 ret = opp_disable(dev, 1000000000);
224 else
225 goto try_something_else;
226 }
227
2285. OPP Data Retrieval Functions
229===============================
230Since OPP library abstracts away the OPP information, a set of functions to pull
231information from the OPP structure is necessary. Once an OPP pointer is
232retrieved using the search functions, the following functions can be used by SoC
233framework to retrieve the information represented inside the OPP layer.
234
235opp_get_voltage - Retrieve the voltage represented by the opp pointer.
236 Example: At a cpufreq transition to a different frequency, SoC
237 framework requires to set the voltage represented by the OPP using
238 the regulator framework to the Power Management chip providing the
239 voltage.
240 soc_switch_to_freq_voltage(freq)
241 {
242 /* do things */
243 rcu_read_lock();
244 opp = opp_find_freq_ceil(dev, &freq);
245 v = opp_get_voltage(opp);
246 rcu_read_unlock();
247 if (v)
248 regulator_set_voltage(.., v);
249 /* do other things */
250 }
251
252opp_get_freq - Retrieve the freq represented by the opp pointer.
253 Example: Lets say the SoC framework uses a couple of helper functions
254 we could pass opp pointers instead of doing additional parameters to
255 handle quiet a bit of data parameters.
256 soc_cpufreq_target(..)
257 {
258 /* do things.. */
259 max_freq = ULONG_MAX;
260 rcu_read_lock();
261 max_opp = opp_find_freq_floor(dev,&max_freq);
262 requested_opp = opp_find_freq_ceil(dev,&freq);
263 if (!IS_ERR(max_opp) && !IS_ERR(requested_opp))
264 r = soc_test_validity(max_opp, requested_opp);
265 rcu_read_unlock();
266 /* do other things */
267 }
268 soc_test_validity(..)
269 {
270 if(opp_get_voltage(max_opp) < opp_get_voltage(requested_opp))
271 return -EINVAL;
272 if(opp_get_freq(max_opp) < opp_get_freq(requested_opp))
273 return -EINVAL;
274 /* do things.. */
275 }
276
277opp_get_opp_count - Retrieve the number of available opps for a device
278 Example: Lets say a co-processor in the SoC needs to know the available
279 frequencies in a table, the main processor can notify as following:
280 soc_notify_coproc_available_frequencies()
281 {
282 /* Do things */
283 rcu_read_lock();
284 num_available = opp_get_opp_count(dev);
285 speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL);
286 /* populate the table in increasing order */
287 freq = 0;
288 while (!IS_ERR(opp = opp_find_freq_ceil(dev, &freq))) {
289 speeds[i] = freq;
290 freq++;
291 i++;
292 }
293 rcu_read_unlock();
294
295 soc_notify_coproc(AVAILABLE_FREQs, speeds, num_available);
296 /* Do other things */
297 }
298
2996. Cpufreq Table Generation
300===========================
301opp_init_cpufreq_table - cpufreq framework typically is initialized with
302 cpufreq_frequency_table_cpuinfo which is provided with the list of
303 frequencies that are available for operation. This function provides
304 a ready to use conversion routine to translate the OPP layer's internal
305 information about the available frequencies into a format readily
306 providable to cpufreq.
307
308 WARNING: Do not use this function in interrupt context.
309
310 Example:
311 soc_pm_init()
312 {
313 /* Do things */
314 r = opp_init_cpufreq_table(dev, &freq_table);
315 if (!r)
316 cpufreq_frequency_table_cpuinfo(policy, freq_table);
317 /* Do other things */
318 }
319
320 NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in
321 addition to CONFIG_PM as power management feature is required to
322 dynamically scale voltage and frequency in a system.
323
3247. Data Structures
325==================
326Typically an SoC contains multiple voltage domains which are variable. Each
327domain is represented by a device pointer. The relationship to OPP can be
328represented as follows:
329SoC
330 |- device 1
331 | |- opp 1 (availability, freq, voltage)
332 | |- opp 2 ..
333 ... ...
334 | `- opp n ..
335 |- device 2
336 ...
337 `- device m
338
339OPP library maintains a internal list that the SoC framework populates and
340accessed by various functions as described above. However, the structures
341representing the actual OPPs and domains are internal to the OPP library itself
342to allow for suitable abstraction reusable across systems.
343
344struct opp - The internal data structure of OPP library which is used to
345 represent an OPP. In addition to the freq, voltage, availability
346 information, it also contains internal book keeping information required
347 for the OPP library to operate on. Pointer to this structure is
348 provided back to the users such as SoC framework to be used as a
349 identifier for OPP in the interactions with OPP layer.
350
351 WARNING: The struct opp pointer should not be parsed or modified by the
352 users. The defaults of for an instance is populated by opp_add, but the
353 availability of the OPP can be modified by opp_enable/disable functions.
354
355struct device - This is used to identify a domain to the OPP layer. The
356 nature of the device and it's implementation is left to the user of
357 OPP library such as the SoC framework.
358
359Overall, in a simplistic view, the data structure operations is represented as
360following:
361
362Initialization / modification:
363 +-----+ /- opp_enable
364opp_add --> | opp | <-------
365 | +-----+ \- opp_disable
366 \-------> domain_info(device)
367
368Search functions:
369 /-- opp_find_freq_ceil ---\ +-----+
370domain_info<---- opp_find_freq_exact -----> | opp |
371 \-- opp_find_freq_floor ---/ +-----+
372
373Retrieval functions:
374+-----+ /- opp_get_voltage
375| opp | <---
376+-----+ \- opp_get_freq
377
378domain_info <- opp_get_opp_count
diff --git a/Documentation/power/regulator/machine.txt b/Documentation/power/regulator/machine.txt
index bdec39b9bd75..b42419b52e44 100644
--- a/Documentation/power/regulator/machine.txt
+++ b/Documentation/power/regulator/machine.txt
@@ -53,11 +53,11 @@ static struct regulator_init_data regulator1_data = {
53 53
54Regulator-1 supplies power to Regulator-2. This relationship must be registered 54Regulator-1 supplies power to Regulator-2. This relationship must be registered
55with the core so that Regulator-1 is also enabled when Consumer A enables its 55with the core so that Regulator-1 is also enabled when Consumer A enables its
56supply (Regulator-2). The supply regulator is set by the supply_regulator_dev 56supply (Regulator-2). The supply regulator is set by the supply_regulator
57field below:- 57field below:-
58 58
59static struct regulator_init_data regulator2_data = { 59static struct regulator_init_data regulator2_data = {
60 .supply_regulator_dev = &platform_regulator1_device.dev, 60 .supply_regulator = "regulator_name",
61 .constraints = { 61 .constraints = {
62 .min_uV = 1800000, 62 .min_uV = 1800000,
63 .max_uV = 2000000, 63 .max_uV = 2000000,
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 55b859b3bc72..b24875b1ced5 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -1,6 +1,7 @@
1Run-time Power Management Framework for I/O Devices 1Run-time Power Management Framework for I/O Devices
2 2
3(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. 3(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
4(C) 2010 Alan Stern <stern@rowland.harvard.edu>
4 5
51. Introduction 61. Introduction
6 7
@@ -43,11 +44,21 @@ struct dev_pm_ops {
43}; 44};
44 45
45The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are 46The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are
46executed by the PM core for either the bus type, or device type (if the bus 47executed by the PM core for either the device type, or the class (if the device
47type's callback is not defined), or device class (if the bus type's and device 48type's struct dev_pm_ops object does not exist), or the bus type (if the
48type's callbacks are not defined) of given device. The bus type, device type 49device type's and class' struct dev_pm_ops objects do not exist) of the given
49and device class callbacks are referred to as subsystem-level callbacks in what 50device (this allows device types to override callbacks provided by bus types or
50follows. 51classes if necessary). The bus type, device type and class callbacks are
52referred to as subsystem-level callbacks in what follows.
53
54By default, the callbacks are always invoked in process context with interrupts
55enabled. However, subsystems can use the pm_runtime_irq_safe() helper function
56to tell the PM core that a device's ->runtime_suspend() and ->runtime_resume()
57callbacks should be invoked in atomic context with interrupts disabled
58(->runtime_idle() is still invoked the default way). This implies that these
59callback routines must not block or sleep, but it also means that the
60synchronous helper functions listed at the end of Section 4 can be used within
61an interrupt handler or in an atomic context.
51 62
52The subsystem-level suspend callback is _entirely_ _responsible_ for handling 63The subsystem-level suspend callback is _entirely_ _responsible_ for handling
53the suspend of the device as appropriate, which may, but need not include 64the suspend of the device as appropriate, which may, but need not include
@@ -157,7 +168,8 @@ rules:
157 to execute it, the other callbacks will not be executed for the same device. 168 to execute it, the other callbacks will not be executed for the same device.
158 169
159 * A request to execute ->runtime_resume() will cancel any pending or 170 * A request to execute ->runtime_resume() will cancel any pending or
160 scheduled requests to execute the other callbacks for the same device. 171 scheduled requests to execute the other callbacks for the same device,
172 except for scheduled autosuspends.
161 173
1623. Run-time PM Device Fields 1743. Run-time PM Device Fields
163 175
@@ -165,7 +177,7 @@ The following device run-time PM fields are present in 'struct dev_pm_info', as
165defined in include/linux/pm.h: 177defined in include/linux/pm.h:
166 178
167 struct timer_list suspend_timer; 179 struct timer_list suspend_timer;
168 - timer used for scheduling (delayed) suspend request 180 - timer used for scheduling (delayed) suspend and autosuspend requests
169 181
170 unsigned long timer_expires; 182 unsigned long timer_expires;
171 - timer expiration time, in jiffies (if this is different from zero, the 183 - timer expiration time, in jiffies (if this is different from zero, the
@@ -230,6 +242,32 @@ defined in include/linux/pm.h:
230 interface; it may only be modified with the help of the pm_runtime_allow() 242 interface; it may only be modified with the help of the pm_runtime_allow()
231 and pm_runtime_forbid() helper functions 243 and pm_runtime_forbid() helper functions
232 244
245 unsigned int no_callbacks;
246 - indicates that the device does not use the run-time PM callbacks (see
247 Section 8); it may be modified only by the pm_runtime_no_callbacks()
248 helper function
249
250 unsigned int irq_safe;
251 - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
252 will be invoked with the spinlock held and interrupts disabled
253
254 unsigned int use_autosuspend;
255 - indicates that the device's driver supports delayed autosuspend (see
256 Section 9); it may be modified only by the
257 pm_runtime{_dont}_use_autosuspend() helper functions
258
259 unsigned int timer_autosuspends;
260 - indicates that the PM core should attempt to carry out an autosuspend
261 when the timer expires rather than a normal suspend
262
263 int autosuspend_delay;
264 - the delay time (in milliseconds) to be used for autosuspend
265
266 unsigned long last_busy;
267 - the time (in jiffies) when the pm_runtime_mark_last_busy() helper
268 function was last called for this device; used in calculating inactivity
269 periods for autosuspend
270
233All of the above fields are members of the 'power' member of 'struct device'. 271All of the above fields are members of the 'power' member of 'struct device'.
234 272
2354. Run-time PM Device Helper Functions 2734. Run-time PM Device Helper Functions
@@ -255,6 +293,12 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
255 error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt 293 error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
256 to suspend the device again in future 294 to suspend the device again in future
257 295
296 int pm_runtime_autosuspend(struct device *dev);
297 - same as pm_runtime_suspend() except that the autosuspend delay is taken
298 into account; if pm_runtime_autosuspend_expiration() says the delay has
299 not yet expired then an autosuspend is scheduled for the appropriate time
300 and 0 is returned
301
258 int pm_runtime_resume(struct device *dev); 302 int pm_runtime_resume(struct device *dev);
259 - execute the subsystem-level resume callback for the device; returns 0 on 303 - execute the subsystem-level resume callback for the device; returns 0 on
260 success, 1 if the device's run-time PM status was already 'active' or 304 success, 1 if the device's run-time PM status was already 'active' or
@@ -267,6 +311,11 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
267 device (the request is represented by a work item in pm_wq); returns 0 on 311 device (the request is represented by a work item in pm_wq); returns 0 on
268 success or error code if the request has not been queued up 312 success or error code if the request has not been queued up
269 313
314 int pm_request_autosuspend(struct device *dev);
315 - schedule the execution of the subsystem-level suspend callback for the
316 device when the autosuspend delay has expired; if the delay has already
317 expired then the work item is queued up immediately
318
270 int pm_schedule_suspend(struct device *dev, unsigned int delay); 319 int pm_schedule_suspend(struct device *dev, unsigned int delay);
271 - schedule the execution of the subsystem-level suspend callback for the 320 - schedule the execution of the subsystem-level suspend callback for the
272 device in future, where 'delay' is the time to wait before queuing up a 321 device in future, where 'delay' is the time to wait before queuing up a
@@ -298,12 +347,24 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
298 - decrement the device's usage counter 347 - decrement the device's usage counter
299 348
300 int pm_runtime_put(struct device *dev); 349 int pm_runtime_put(struct device *dev);
301 - decrement the device's usage counter, run pm_request_idle(dev) and return 350 - decrement the device's usage counter; if the result is 0 then run
302 its result 351 pm_request_idle(dev) and return its result
352
353 int pm_runtime_put_autosuspend(struct device *dev);
354 - decrement the device's usage counter; if the result is 0 then run
355 pm_request_autosuspend(dev) and return its result
303 356
304 int pm_runtime_put_sync(struct device *dev); 357 int pm_runtime_put_sync(struct device *dev);
305 - decrement the device's usage counter, run pm_runtime_idle(dev) and return 358 - decrement the device's usage counter; if the result is 0 then run
306 its result 359 pm_runtime_idle(dev) and return its result
360
361 int pm_runtime_put_sync_suspend(struct device *dev);
362 - decrement the device's usage counter; if the result is 0 then run
363 pm_runtime_suspend(dev) and return its result
364
365 int pm_runtime_put_sync_autosuspend(struct device *dev);
366 - decrement the device's usage counter; if the result is 0 then run
367 pm_runtime_autosuspend(dev) and return its result
307 368
308 void pm_runtime_enable(struct device *dev); 369 void pm_runtime_enable(struct device *dev);
309 - enable the run-time PM helper functions to run the device bus type's 370 - enable the run-time PM helper functions to run the device bus type's
@@ -336,8 +397,8 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
336 zero) 397 zero)
337 398
338 bool pm_runtime_suspended(struct device *dev); 399 bool pm_runtime_suspended(struct device *dev);
339 - return true if the device's runtime PM status is 'suspended', or false 400 - return true if the device's runtime PM status is 'suspended' and its
340 otherwise 401 'power.disable_depth' field is equal to zero, or false otherwise
341 402
342 void pm_runtime_allow(struct device *dev); 403 void pm_runtime_allow(struct device *dev);
343 - set the power.runtime_auto flag for the device and decrease its usage 404 - set the power.runtime_auto flag for the device and decrease its usage
@@ -349,19 +410,65 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
349 counter (used by the /sys/devices/.../power/control interface to 410 counter (used by the /sys/devices/.../power/control interface to
350 effectively prevent the device from being power managed at run time) 411 effectively prevent the device from being power managed at run time)
351 412
413 void pm_runtime_no_callbacks(struct device *dev);
414 - set the power.no_callbacks flag for the device and remove the run-time
415 PM attributes from /sys/devices/.../power (or prevent them from being
416 added when the device is registered)
417
418 void pm_runtime_irq_safe(struct device *dev);
419 - set the power.irq_safe flag for the device, causing the runtime-PM
420 suspend and resume callbacks (but not the idle callback) to be invoked
421 with interrupts disabled
422
423 void pm_runtime_mark_last_busy(struct device *dev);
424 - set the power.last_busy field to the current time
425
426 void pm_runtime_use_autosuspend(struct device *dev);
427 - set the power.use_autosuspend flag, enabling autosuspend delays
428
429 void pm_runtime_dont_use_autosuspend(struct device *dev);
430 - clear the power.use_autosuspend flag, disabling autosuspend delays
431
432 void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
433 - set the power.autosuspend_delay value to 'delay' (expressed in
434 milliseconds); if 'delay' is negative then run-time suspends are
435 prevented
436
437 unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
438 - calculate the time when the current autosuspend delay period will expire,
439 based on power.last_busy and power.autosuspend_delay; if the delay time
440 is 1000 ms or larger then the expiration time is rounded up to the
441 nearest second; returns 0 if the delay period has already expired or
442 power.use_autosuspend isn't set, otherwise returns the expiration time
443 in jiffies
444
352It is safe to execute the following helper functions from interrupt context: 445It is safe to execute the following helper functions from interrupt context:
353 446
354pm_request_idle() 447pm_request_idle()
448pm_request_autosuspend()
355pm_schedule_suspend() 449pm_schedule_suspend()
356pm_request_resume() 450pm_request_resume()
357pm_runtime_get_noresume() 451pm_runtime_get_noresume()
358pm_runtime_get() 452pm_runtime_get()
359pm_runtime_put_noidle() 453pm_runtime_put_noidle()
360pm_runtime_put() 454pm_runtime_put()
455pm_runtime_put_autosuspend()
456pm_runtime_enable()
361pm_suspend_ignore_children() 457pm_suspend_ignore_children()
362pm_runtime_set_active() 458pm_runtime_set_active()
363pm_runtime_set_suspended() 459pm_runtime_set_suspended()
364pm_runtime_enable() 460pm_runtime_suspended()
461pm_runtime_mark_last_busy()
462pm_runtime_autosuspend_expiration()
463
464If pm_runtime_irq_safe() has been called for a device then the following helper
465functions may also be used in interrupt context:
466
467pm_runtime_suspend()
468pm_runtime_autosuspend()
469pm_runtime_resume()
470pm_runtime_get_sync()
471pm_runtime_put_sync_suspend()
365 472
3665. Run-time PM Initialization, Device Probing and Removal 4735. Run-time PM Initialization, Device Probing and Removal
367 474
@@ -394,13 +501,29 @@ helper functions described in Section 4. In that case, pm_runtime_resume()
394should be used. Of course, for this purpose the device's run-time PM has to be 501should be used. Of course, for this purpose the device's run-time PM has to be
395enabled earlier by calling pm_runtime_enable(). 502enabled earlier by calling pm_runtime_enable().
396 503
397If the device bus type's or driver's ->probe() or ->remove() callback runs 504If the device bus type's or driver's ->probe() callback runs
398pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, 505pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
399they will fail returning -EAGAIN, because the device's usage counter is 506they will fail returning -EAGAIN, because the device's usage counter is
400incremented by the core before executing ->probe() and ->remove(). Still, it 507incremented by the driver core before executing ->probe(). Still, it may be
401may be desirable to suspend the device as soon as ->probe() or ->remove() has 508desirable to suspend the device as soon as ->probe() has finished, so the driver
402finished, so the PM core uses pm_runtime_idle_sync() to invoke the 509core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for
403subsystem-level idle callback for the device at that time. 510the device at that time.
511
512Moreover, the driver core prevents runtime PM callbacks from racing with the bus
513notifier callback in __device_release_driver(), which is necessary, because the
514notifier is used by some subsystems to carry out operations affecting the
515runtime PM functionality. It does so by calling pm_runtime_get_sync() before
516driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications. This
517resumes the device if it's in the suspended state and prevents it from
518being suspended again while those routines are being executed.
519
520To allow bus types and drivers to put devices into the suspended state by
521calling pm_runtime_suspend() from their ->remove() routines, the driver core
522executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER
523notifications in __device_release_driver(). This requires bus types and
524drivers to make their ->remove() callbacks avoid races with runtime PM directly,
525but also it allows of more flexibility in the handling of devices during the
526removal of their drivers.
404 527
405The user space can effectively disallow the driver of the device to power manage 528The user space can effectively disallow the driver of the device to power manage
406it at run time by changing the value of its /sys/devices/.../power/control 529it at run time by changing the value of its /sys/devices/.../power/control
@@ -459,11 +582,6 @@ to do this is:
459 pm_runtime_set_active(dev); 582 pm_runtime_set_active(dev);
460 pm_runtime_enable(dev); 583 pm_runtime_enable(dev);
461 584
462The PM core always increments the run-time usage counter before calling the
463->prepare() callback and decrements it after calling the ->complete() callback.
464Hence disabling run-time PM temporarily like this will not cause any run-time
465suspend callbacks to be lost.
466
4677. Generic subsystem callbacks 5857. Generic subsystem callbacks
468 586
469Subsystems may wish to conserve code space by using the set of generic power 587Subsystems may wish to conserve code space by using the set of generic power
@@ -524,3 +642,141 @@ poweroff and run-time suspend callback, and similarly for system resume, thaw,
524restore, and run-time resume, can achieve this with the help of the 642restore, and run-time resume, can achieve this with the help of the
525UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its 643UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
526last argument to NULL). 644last argument to NULL).
645
6468. "No-Callback" Devices
647
648Some "devices" are only logical sub-devices of their parent and cannot be
649power-managed on their own. (The prototype example is a USB interface. Entire
650USB devices can go into low-power mode or send wake-up requests, but neither is
651possible for individual interfaces.) The drivers for these devices have no
652need of run-time PM callbacks; if the callbacks did exist, ->runtime_suspend()
653and ->runtime_resume() would always return 0 without doing anything else and
654->runtime_idle() would always call pm_runtime_suspend().
655
656Subsystems can tell the PM core about these devices by calling
657pm_runtime_no_callbacks(). This should be done after the device structure is
658initialized and before it is registered (although after device registration is
659also okay). The routine will set the device's power.no_callbacks flag and
660prevent the non-debugging run-time PM sysfs attributes from being created.
661
662When power.no_callbacks is set, the PM core will not invoke the
663->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks.
664Instead it will assume that suspends and resumes always succeed and that idle
665devices should be suspended.
666
667As a consequence, the PM core will never directly inform the device's subsystem
668or driver about run-time power changes. Instead, the driver for the device's
669parent must take responsibility for telling the device's driver when the
670parent's power state changes.
671
6729. Autosuspend, or automatically-delayed suspends
673
674Changing a device's power state isn't free; it requires both time and energy.
675A device should be put in a low-power state only when there's some reason to
676think it will remain in that state for a substantial time. A common heuristic
677says that a device which hasn't been used for a while is liable to remain
678unused; following this advice, drivers should not allow devices to be suspended
679at run-time until they have been inactive for some minimum period. Even when
680the heuristic ends up being non-optimal, it will still prevent devices from
681"bouncing" too rapidly between low-power and full-power states.
682
683The term "autosuspend" is an historical remnant. It doesn't mean that the
684device is automatically suspended (the subsystem or driver still has to call
685the appropriate PM routines); rather it means that run-time suspends will
686automatically be delayed until the desired period of inactivity has elapsed.
687
688Inactivity is determined based on the power.last_busy field. Drivers should
689call pm_runtime_mark_last_busy() to update this field after carrying out I/O,
690typically just before calling pm_runtime_put_autosuspend(). The desired length
691of the inactivity period is a matter of policy. Subsystems can set this length
692initially by calling pm_runtime_set_autosuspend_delay(), but after device
693registration the length should be controlled by user space, using the
694/sys/devices/.../power/autosuspend_delay_ms attribute.
695
696In order to use autosuspend, subsystems or drivers must call
697pm_runtime_use_autosuspend() (preferably before registering the device), and
698thereafter they should use the various *_autosuspend() helper functions instead
699of the non-autosuspend counterparts:
700
701 Instead of: pm_runtime_suspend use: pm_runtime_autosuspend;
702 Instead of: pm_schedule_suspend use: pm_request_autosuspend;
703 Instead of: pm_runtime_put use: pm_runtime_put_autosuspend;
704 Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend.
705
706Drivers may also continue to use the non-autosuspend helper functions; they
707will behave normally, not taking the autosuspend delay into account.
708Similarly, if the power.use_autosuspend field isn't set then the autosuspend
709helper functions will behave just like the non-autosuspend counterparts.
710
711The implementation is well suited for asynchronous use in interrupt contexts.
712However such use inevitably involves races, because the PM core can't
713synchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
714This synchronization must be handled by the driver, using its private lock.
715Here is a schematic pseudo-code example:
716
717 foo_read_or_write(struct foo_priv *foo, void *data)
718 {
719 lock(&foo->private_lock);
720 add_request_to_io_queue(foo, data);
721 if (foo->num_pending_requests++ == 0)
722 pm_runtime_get(&foo->dev);
723 if (!foo->is_suspended)
724 foo_process_next_request(foo);
725 unlock(&foo->private_lock);
726 }
727
728 foo_io_completion(struct foo_priv *foo, void *req)
729 {
730 lock(&foo->private_lock);
731 if (--foo->num_pending_requests == 0) {
732 pm_runtime_mark_last_busy(&foo->dev);
733 pm_runtime_put_autosuspend(&foo->dev);
734 } else {
735 foo_process_next_request(foo);
736 }
737 unlock(&foo->private_lock);
738 /* Send req result back to the user ... */
739 }
740
741 int foo_runtime_suspend(struct device *dev)
742 {
743 struct foo_priv foo = container_of(dev, ...);
744 int ret = 0;
745
746 lock(&foo->private_lock);
747 if (foo->num_pending_requests > 0) {
748 ret = -EBUSY;
749 } else {
750 /* ... suspend the device ... */
751 foo->is_suspended = 1;
752 }
753 unlock(&foo->private_lock);
754 return ret;
755 }
756
757 int foo_runtime_resume(struct device *dev)
758 {
759 struct foo_priv foo = container_of(dev, ...);
760
761 lock(&foo->private_lock);
762 /* ... resume the device ... */
763 foo->is_suspended = 0;
764 pm_runtime_mark_last_busy(&foo->dev);
765 if (foo->num_pending_requests > 0)
766 foo_process_requests(foo);
767 unlock(&foo->private_lock);
768 return 0;
769 }
770
771The important point is that after foo_io_completion() asks for an autosuspend,
772the foo_runtime_suspend() callback may race with foo_read_or_write().
773Therefore foo_runtime_suspend() has to check whether there are any pending I/O
774requests (while holding the private lock) before allowing the suspend to
775proceed.
776
777In addition, the power.autosuspend_delay field can be changed by user space at
778any time. If a driver cares about this, it can call
779pm_runtime_autosuspend_expiration() from within the ->runtime_suspend()
780callback while holding its private lock. If the function returns a nonzero
781value then the delay has not yet expired and the callback should return
782-EAGAIN.
diff --git a/Documentation/power/s2ram.txt b/Documentation/power/s2ram.txt
index 514b94fc931e..1bdfa0443773 100644
--- a/Documentation/power/s2ram.txt
+++ b/Documentation/power/s2ram.txt
@@ -49,6 +49,13 @@ machine that doesn't boot) is:
49 device (lspci and /sys/devices/pci* is your friend), and see if you can 49 device (lspci and /sys/devices/pci* is your friend), and see if you can
50 fix it, disable it, or trace into its resume function. 50 fix it, disable it, or trace into its resume function.
51 51
52 If no device matches the hash (or any matches appear to be false positives),
53 the culprit may be a device from a loadable kernel module that is not loaded
54 until after the hash is checked. You can check the hash against the current
55 devices again after more modules are loaded using sysfs:
56
57 cat /sys/power/pm_trace_dev_match
58
52For example, the above happens to be the VGA device on my EVO, which I 59For example, the above happens to be the VGA device on my EVO, which I
53used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out 60used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out
54that "radeonfb" simply cannot resume that device - it tries to set the 61that "radeonfb" simply cannot resume that device - it tries to set the
diff --git a/Documentation/power/states.txt b/Documentation/power/states.txt
index 34800cc521bf..4416b28630df 100644
--- a/Documentation/power/states.txt
+++ b/Documentation/power/states.txt
@@ -62,12 +62,12 @@ setup via another operating system for it to use. Despite the
62inconvenience, this method requires minimal work by the kernel, since 62inconvenience, this method requires minimal work by the kernel, since
63the firmware will also handle restoring memory contents on resume. 63the firmware will also handle restoring memory contents on resume.
64 64
65For suspend-to-disk, a mechanism called swsusp called 'swsusp' (Swap 65For suspend-to-disk, a mechanism called 'swsusp' (Swap Suspend) is used
66Suspend) is used to write memory contents to free swap space. 66to write memory contents to free swap space. swsusp has some restrictive
67swsusp has some restrictive requirements, but should work in most 67requirements, but should work in most cases. Some, albeit outdated,
68cases. Some, albeit outdated, documentation can be found in 68documentation can be found in Documentation/power/swsusp.txt.
69Documentation/power/swsusp.txt. Alternatively, userspace can do most 69Alternatively, userspace can do most of the actual suspend to disk work,
70of the actual suspend to disk work, see userland-swsusp.txt. 70see userland-swsusp.txt.
71 71
72Once memory state is written to disk, the system may either enter a 72Once memory state is written to disk, the system may either enter a
73low-power state (like ACPI S4), or it may simply power down. Powering 73low-power state (like ACPI S4), or it may simply power down. Powering
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index 9d60ab717a7b..ac190cf1963e 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -66,7 +66,8 @@ swsusp saves the state of the machine into active swaps and then reboots or
66powerdowns. You must explicitly specify the swap partition to resume from with 66powerdowns. You must explicitly specify the swap partition to resume from with
67``resume='' kernel option. If signature is found it loads and restores saved 67``resume='' kernel option. If signature is found it loads and restores saved
68state. If the option ``noresume'' is specified as a boot parameter, it skips 68state. If the option ``noresume'' is specified as a boot parameter, it skips
69the resuming. 69the resuming. If the option ``hibernate=nocompress'' is specified as a boot
70parameter, it saves hibernation image without compression.
70 71
71In the meantime while the system is suspended you should not add/remove any 72In the meantime while the system is suspended you should not add/remove any
72of the hardware, write to the filesystems, etc. 73of the hardware, write to the filesystems, etc.
@@ -191,7 +192,7 @@ Q: There don't seem to be any generally useful behavioral
191distinctions between SUSPEND and FREEZE. 192distinctions between SUSPEND and FREEZE.
192 193
193A: Doing SUSPEND when you are asked to do FREEZE is always correct, 194A: Doing SUSPEND when you are asked to do FREEZE is always correct,
194but it may be unneccessarily slow. If you want your driver to stay simple, 195but it may be unnecessarily slow. If you want your driver to stay simple,
195slowness may not matter to you. It can always be fixed later. 196slowness may not matter to you. It can always be fixed later.
196 197
197For devices like disk it does matter, you do not want to spindown for 198For devices like disk it does matter, you do not want to spindown for
@@ -236,7 +237,7 @@ disk. Whole sequence goes like
236 237
237 running system, user asks for suspend-to-disk 238 running system, user asks for suspend-to-disk
238 239
239 user processes are stopped (in common case there are none, but with resume-from-initrd, noone knows) 240 user processes are stopped (in common case there are none, but with resume-from-initrd, no one knows)
240 241
241 read image from disk 242 read image from disk
242 243
diff --git a/Documentation/power/userland-swsusp.txt b/Documentation/power/userland-swsusp.txt
index 81680f9f5909..1101bee4e822 100644
--- a/Documentation/power/userland-swsusp.txt
+++ b/Documentation/power/userland-swsusp.txt
@@ -98,7 +98,7 @@ SNAPSHOT_S2RAM - suspend to RAM; using this call causes the kernel to
98The device's read() operation can be used to transfer the snapshot image from 98The device's read() operation can be used to transfer the snapshot image from
99the kernel. It has the following limitations: 99the kernel. It has the following limitations:
100- you cannot read() more than one virtual memory page at a time 100- you cannot read() more than one virtual memory page at a time
101- read()s accross page boundaries are impossible (ie. if ypu read() 1/2 of 101- read()s across page boundaries are impossible (ie. if ypu read() 1/2 of
102 a page in the previous call, you will only be able to read() 102 a page in the previous call, you will only be able to read()
103 _at_ _most_ 1/2 of the page in the next call) 103 _at_ _most_ 1/2 of the page in the next call)
104 104
@@ -137,7 +137,7 @@ mechanism and the userland utilities using the interface SHOULD use additional
137means, such as checksums, to ensure the integrity of the snapshot image. 137means, such as checksums, to ensure the integrity of the snapshot image.
138 138
139The suspending and resuming utilities MUST lock themselves in memory, 139The suspending and resuming utilities MUST lock themselves in memory,
140preferrably using mlockall(), before calling SNAPSHOT_FREEZE. 140preferably using mlockall(), before calling SNAPSHOT_FREEZE.
141 141
142The suspending utility MUST check the value stored by SNAPSHOT_CREATE_IMAGE 142The suspending utility MUST check the value stored by SNAPSHOT_CREATE_IMAGE
143in the memory location pointed to by the last argument of ioctl() and proceed 143in the memory location pointed to by the last argument of ioctl() and proceed
@@ -147,7 +147,7 @@ in accordance with it:
147 (a) The suspending utility MUST NOT close the snapshot device 147 (a) The suspending utility MUST NOT close the snapshot device
148 _unless_ the whole suspend procedure is to be cancelled, in 148 _unless_ the whole suspend procedure is to be cancelled, in
149 which case, if the snapshot image has already been saved, the 149 which case, if the snapshot image has already been saved, the
150 suspending utility SHOULD destroy it, preferrably by zapping 150 suspending utility SHOULD destroy it, preferably by zapping
151 its header. If the suspend is not to be cancelled, the 151 its header. If the suspend is not to be cancelled, the
152 system MUST be powered off or rebooted after the snapshot 152 system MUST be powered off or rebooted after the snapshot
153 image has been saved. 153 image has been saved.