aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/power
diff options
context:
space:
mode:
authorAlan Stern <stern@rowland.harvard.edu>2010-03-26 18:53:55 -0400
committerRafael J. Wysocki <rjw@sisk.pl>2010-05-10 17:08:16 -0400
commitd6f9cda1fd241bc7a1d896da94950fd972eca9b7 (patch)
treee2429cbfb7b59e52c77672a85c9e0ef1aa8c759e /Documentation/power
parent624f6ec871886525ca19cf7841f918da91d4315e (diff)
PM: Improve device power management document
Improve the device power management document after it's been updated by the previous patch. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Diffstat (limited to 'Documentation/power')
-rw-r--r--Documentation/power/devices.txt811
1 files changed, 383 insertions, 428 deletions
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 10018d19e0bf..57080cd74575 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -1,11 +1,13 @@
1Device Power Management 1Device Power Management
2 2
3(C) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. 3Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
4Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu>
5
4 6
5Most of the code in Linux is device drivers, so most of the Linux power 7Most of the code in Linux is device drivers, so most of the Linux power
6management code is also driver-specific. Most drivers will do very little; 8management (PM) code is also driver-specific. Most drivers will do very
7others, especially for platforms with small batteries (like cell phones), 9little; others, especially for platforms with small batteries (like cell
8will do a lot. 10phones), will do a lot.
9 11
10This writeup gives an overview of how drivers interact with system-wide 12This writeup gives an overview of how drivers interact with system-wide
11power management goals, emphasizing the models and interfaces that are 13power management goals, emphasizing the models and interfaces that are
@@ -19,9 +21,10 @@ Drivers will use one or both of these models to put devices into low-power
19states: 21states:
20 22
21 System Sleep model: 23 System Sleep model:
22 Drivers can enter low power states as part of entering system-wide 24 Drivers can enter low-power states as part of entering system-wide
23 low-power states like "suspend-to-ram", or (mostly for systems with 25 low-power states like "suspend" (also known as "suspend-to-RAM"), or
24 disks) "hibernate" (suspend-to-disk). 26 (mostly for systems with disks) "hibernation" (also known as
27 "suspend-to-disk").
25 28
26 This is something that device, bus, and class drivers collaborate on 29 This is something that device, bus, and class drivers collaborate on
27 by implementing various role-specific suspend and resume methods to 30 by implementing various role-specific suspend and resume methods to
@@ -29,41 +32,41 @@ states:
29 them without loss of data. 32 them without loss of data.
30 33
31 Some drivers can manage hardware wakeup events, which make the system 34 Some drivers can manage hardware wakeup events, which make the system
32 leave that low-power state. This feature may be enabled or disabled 35 leave the low-power state. This feature may be enabled or disabled
33 using the relevant /sys/devices/.../power/wakeup file (for Ethernet 36 using the relevant /sys/devices/.../power/wakeup file (for Ethernet
34 drivers the ioctl interface used by ethtool may also be used for this 37 drivers the ioctl interface used by ethtool may also be used for this
35 purpose); enabling it may cost some power usage, but let the whole 38 purpose); enabling it may cost some power usage, but let the whole
36 system enter low power states more often. 39 system enter low-power states more often.
37 40
38 Runtime Power Management model: 41 Runtime Power Management model:
39 Devices may also be put into low power states while the system is 42 Devices may also be put into low-power states while the system is
40 running, independently of other power management activity in principle. 43 running, independently of other power management activity in principle.
41 However, devices are not generally independent of each other (for 44 However, devices are not generally independent of each other (for
42 example, parent device cannot be suspended unless all of its child 45 example, a parent device cannot be suspended unless all of its child
43 devices have been suspended). Moreover, depending on the bus type the 46 devices have been suspended). Moreover, depending on the bus type the
44 device is on, it may be necessary to carry out some bus-specific 47 device is on, it may be necessary to carry out some bus-specific
45 operations on the device for this purpose. Also, devices put into low 48 operations on the device for this purpose. Devices put into low power
46 power states at run time may require special handling during system-wide 49 states at run time may require special handling during system-wide power
47 power transitions, like suspend to RAM. 50 transitions (suspend or hibernation).
48 51
49 For these reasons not only the device driver itself, but also the 52 For these reasons not only the device driver itself, but also the
50 appropriate subsystem (bus type, device type or device class) driver 53 appropriate subsystem (bus type, device type or device class) driver and
51 and the PM core are involved in the runtime power management of devices. 54 the PM core are involved in runtime power management. As in the system
52 Like in the system sleep power management case, they need to collaborate 55 sleep power management case, they need to collaborate by implementing
53 by implementing various role-specific suspend and resume methods, so 56 various role-specific suspend and resume methods, so that the hardware
54 that the hardware is cleanly powered down and reactivated without data 57 is cleanly powered down and reactivated without data or service loss.
55 or service loss. 58
56 59There's not a lot to be said about those low-power states except that they are
57There's not a lot to be said about those low power states except that they 60very system-specific, and often device-specific. Also, that if enough devices
58are very system-specific, and often device-specific. Also, that if enough 61have been put into low-power states (at runtime), the effect may be very similar
59devices have been put into low power states (at "run time"), the effect may be 62to entering some system-wide low-power state (system sleep) ... and that
60very similar to entering some system-wide low-power state (system sleep) ... and 63synergies exist, so that several drivers using runtime PM might put the system
61that synergies exist, so that several drivers using runtime PM might put the 64into a state where even deeper power saving options are available.
62system into a state where even deeper power saving options are available. 65
63 66Most suspended devices will have quiesced all I/O: no more DMA or IRQs (except
64Most suspended devices will have quiesced all I/O: no more DMA or IRQs, no 67for wakeup events), no more data read or written, and requests from upstream
65more data read or written, and requests from upstream drivers are no longer 68drivers are no longer accepted. A given bus or platform may have different
66accepted. A given bus or platform may have different requirements though. 69requirements though.
67 70
68Examples of hardware wakeup events include an alarm from a real time clock, 71Examples of hardware wakeup events include an alarm from a real time clock,
69network wake-on-LAN packets, keyboard or mouse activity, and media insertion 72network wake-on-LAN packets, keyboard or mouse activity, and media insertion
@@ -72,10 +75,10 @@ or removal (for PCMCIA, MMC/SD, USB, and so on).
72 75
73Interfaces for Entering System Sleep States 76Interfaces for Entering System Sleep States
74=========================================== 77===========================================
75There are programming interfaces provided for subsystem (bus type, device type, 78There are programming interfaces provided for subsystems (bus type, device type,
76device class) and device drivers in order to allow them to participate in the 79device class) and device drivers to allow them to participate in the power
77power management of devices they are concerned with. They cover the system 80management of devices they are concerned with. These interfaces cover both
78sleep power management as well as the runtime power management of devices. 81system sleep and runtime power management.
79 82
80 83
81Device Power Management Operations 84Device Power Management Operations
@@ -106,16 +109,15 @@ struct dev_pm_ops {
106 109
107This structure is defined in include/linux/pm.h and the methods included in it 110This structure is defined in include/linux/pm.h and the methods included in it
108are also described in that file. Their roles will be explained in what follows. 111are also described in that file. Their roles will be explained in what follows.
109For now, it should be sufficient to remember that the last three of them are 112For now, it should be sufficient to remember that the last three methods are
110specific to runtime power management, while the remaining ones are used during 113specific to runtime power management while the remaining ones are used during
111system-wide power transitions. 114system-wide power transitions.
112 115
113There also is an "old" or "legacy", deprecated way of implementing power 116There also is a deprecated "old" or "legacy" interface for power management
114management operations available at least for some subsystems. This approach 117operations available at least for some subsystems. This approach does not use
115does not use struct dev_pm_ops objects and it only is suitable for implementing 118struct dev_pm_ops objects and it is suitable only for implementing system sleep
116system sleep power management methods. Therefore it is not described in this 119power management methods. Therefore it is not described in this document, so
117document, so please refer directly to the source code for more information about 120please refer directly to the source code for more information about it.
118it.
119 121
120 122
121Subsystem-Level Methods 123Subsystem-Level Methods
@@ -125,10 +127,10 @@ pointed to by the pm member of struct bus_type, struct device_type and
125struct class. They are mostly of interest to the people writing infrastructure 127struct class. They are mostly of interest to the people writing infrastructure
126for buses, like PCI or USB, or device type and device class drivers. 128for buses, like PCI or USB, or device type and device class drivers.
127 129
128Bus drivers implement these methods as appropriate for the hardware and 130Bus drivers implement these methods as appropriate for the hardware and the
129the drivers using it; PCI works differently from USB, and so on. Not many 131drivers using it; PCI works differently from USB, and so on. Not many people
130people write subsystem-level drivers; most driver code is a "device driver" that 132write subsystem-level drivers; most driver code is a "device driver" that builds
131builds on top of bus-specific framework code. 133on top of bus-specific framework code.
132 134
133For more information on these driver calls, see the description later; 135For more information on these driver calls, see the description later;
134they are called in phases for every device, respecting the parent-child 136they are called in phases for every device, respecting the parent-child
@@ -137,66 +139,78 @@ sequencing in the driver model tree.
137 139
138/sys/devices/.../power/wakeup files 140/sys/devices/.../power/wakeup files
139----------------------------------- 141-----------------------------------
140All devices in the driver model have two flags to control handling of 142All devices in the driver model have two flags to control handling of wakeup
141wakeup events, which are hardware signals that can force the device and/or 143events (hardware signals that can force the device and/or system out of a low
142system out of a low power state. These are initialized by bus or device 144power state). These flags are initialized by bus or device driver code using
143driver code using device_init_wakeup(). 145device_set_wakeup_capable() and device_set_wakeup_enable(), defined in
146include/linux/pm_wakeup.h.
144 147
145The "can_wakeup" flag just records whether the device (and its driver) can 148The "can_wakeup" flag just records whether the device (and its driver) can
146physically support wakeup events. When that flag is clear, the sysfs 149physically support wakeup events. The device_set_wakeup_capable() routine
147"wakeup" file is empty, and device_may_wakeup() returns false. 150affects this flag. The "should_wakeup" flag controls whether the device should
148 151try to use its wakeup mechanism. device_set_wakeup_enable() affects this flag;
149For devices that can issue wakeup events, a separate flag controls whether 152for the most part drivers should not change its value. The initial value of
150that device should try to use its wakeup mechanism. The initial value of 153should_wakeup is supposed to be false for the majority of devices; the major
151device_may_wakeup() will be false for the majority of devices, except for 154exceptions are power buttons, keyboards, and Ethernet adapters whose WoL
152power buttons, keyboards, and Ethernet adapters whose WoL (wake-on-LAN) feature 155(wake-on-LAN) feature has been set up with ethtool.
153has been set up with ethtool. Thus in the majority of cases the device's 156
154"wakeup" file will initially hold the value "disabled". Userspace can change 157Whether or not a device is capable of issuing wakeup events is a hardware
155that to "enabled", so that device_may_wakeup() returns true, or change it back 158matter, and the kernel is responsible for keeping track of it. By contrast,
156to "disabled", so that it returns false again. 159whether or not a wakeup-capable device should issue wakeup events is a policy
160decision, and it is managed by user space through a sysfs attribute: the
161power/wakeup file. User space can write the strings "enabled" or "disabled" to
162set or clear the should_wakeup flag, respectively. Reads from the file will
163return the corresponding string if can_wakeup is true, but if can_wakeup is
164false then reads will return an empty string, to indicate that the device
165doesn't support wakeup events. (But even though the file appears empty, writes
166will still affect the should_wakeup flag.)
167
168The device_may_wakeup() routine returns true only if both flags are set.
169Drivers should check this routine when putting devices in a low-power state
170during a system sleep transition, to see whether or not to enable the devices'
171wakeup mechanisms. However for runtime power management, wakeup events should
172be enabled whenever the device and driver both support them, regardless of the
173should_wakeup flag.
157 174
158 175
159/sys/devices/.../power/control files 176/sys/devices/.../power/control files
160------------------------------------ 177------------------------------------
161All devices in the driver model have a flag to control the desired behavior of 178Each device in the driver model has a flag to control whether it is subject to
162its driver with respect to runtime power management. This flag, called 179runtime power management. This flag, called runtime_auto, is initialized by the
163runtime_auto, is initialized by the bus type (or generally subsystem) code using 180bus type (or generally subsystem) code using pm_runtime_allow() or
164pm_runtime_allow() or pm_runtime_forbid(), depending on whether or not the 181pm_runtime_forbid(); the default is to allow runtime power management.
165driver is supposed to power manage the device at run time by default, 182
166respectively. 183The setting can be adjusted by user space by writing either "on" or "auto" to
167 184the device's power/control sysfs file. Writing "auto" calls pm_runtime_allow(),
168This setting may be adjusted by user space by writing either "on" or "auto" to 185setting the flag and allowing the device to be runtime power-managed by its
169the device's "control" file. If "auto" is written, the device's runtime_auto 186driver. Writing "on" calls pm_runtime_forbid(), clearing the flag, returning
170flag will be set and the driver will be allowed to power manage the device if 187the device to full power if it was in a low-power state, and preventing the
171capable of doing that. If "on" is written, the driver is not allowed to power 188device from being runtime power-managed. User space can check the current value
172manage the device which in turn is supposed to remain in the full power state at 189of the runtime_auto flag by reading the file.
173run time. User space can check the current value of the runtime_auto flag by
174reading from the device's "control" file.
175 190
176The device's runtime_auto flag has no effect on the handling of system-wide 191The device's runtime_auto flag has no effect on the handling of system-wide
177power transitions by its driver. In particular, the device can (and in the 192power transitions. In particular, the device can (and in the majority of cases
178majority of cases should and will) be put into a low power state during a 193should and will) be put into a low-power state during a system-wide transition
179system-wide transition to a sleep state (like "suspend-to-RAM") even though its 194to a sleep state even though its runtime_auto flag is clear.
180runtime_auto flag is unset (in which case its "control" file contains "on").
181 195
182For more information about the runtime power management framework for devices 196For more information about the runtime power management framework, refer to
183refer to Documentation/power/runtime_pm.txt. 197Documentation/power/runtime_pm.txt.
184 198
185 199
186Calling Drivers to Enter System Sleep States 200Calling Drivers to Enter and Leave System Sleep States
187============================================ 201======================================================
188When the system goes into a sleep state, each device's driver is asked 202When the system goes into a sleep state, each device's driver is asked to
189to suspend the device by putting it into state compatible with the target 203suspend the device by putting it into a state compatible with the target
190system state. That's usually some version of "off", but the details are 204system state. That's usually some version of "off", but the details are
191system-specific. Also, wakeup-enabled devices will usually stay partly 205system-specific. Also, wakeup-enabled devices will usually stay partly
192functional in order to wake the system. 206functional in order to wake the system.
193 207
194When the system leaves that low power state, the device's driver is asked 208When the system leaves that low-power state, the device's driver is asked to
195to resume it. The suspend and resume operations always go together, and 209resume it by returning it to full power. The suspend and resume operations
196both are multi-phase operations. 210always go together, and both are multi-phase operations.
197 211
198For simple drivers, suspend might quiesce the device using the class code 212For simple drivers, suspend might quiesce the device using class code
199and then turn its hardware as "off" as possible with late_suspend. The 213and then turn its hardware as "off" as possible during suspend_noirq. The
200matching resume calls would then completely reinitialize the hardware 214matching resume calls would then completely reinitialize the hardware
201before reactivating its class I/O queues. 215before reactivating its class I/O queues.
202 216
@@ -224,269 +238,129 @@ devices have been suspended. Device drivers must be prepared to cope with such
224situations. 238situations.
225 239
226 240
227Suspending Devices 241System Power Management Phases
228------------------ 242------------------------------
229Suspending a given device is done in several phases. Suspending the 243Suspending or resuming the system is done in several phases. Different phases
230system always includes every phase, executing calls for every device 244are used for standby or memory sleep states ("suspend-to-RAM") and the
231before the next phase begins. Not all busses or classes support all 245hibernation state ("suspend-to-disk"). Each phase involves executing callbacks
232these callbacks; and not all drivers use all the callbacks. 246for every device before the next phase begins. Not all busses or classes
233 247support all these callbacks and not all drivers use all the callbacks. The
234Generally, different callbacks are used depending on whether the system is 248various phases always run after tasks have been frozen and before they are
235going to the standby or memory sleep state ("suspend-to-RAM") or it is going to 249unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have
236be hibernated ("suspend-to-disk"). 250been disabled (except for those marked with the IRQ_WAKEUP flag).
237 251
238If the system goes to the standby or memory sleep state the phases are seen by 252Most phases use bus, type, and class callbacks (that is, methods defined in
239driver notifications issued in this order: 253dev->bus->pm, dev->type->pm, and dev->class->pm). The prepare and complete
254phases are exceptions; they use only bus callbacks. When multiple callbacks
255are used in a phase, they are invoked in the order: <class, type, bus> during
256power-down transitions and in the opposite order during power-up transitions.
257For example, during the suspend phase the PM core invokes
240 258
241 1 bus->pm.prepare(dev) is called after tasks are frozen and it is supposed 259 dev->class->pm.suspend(dev);
242 to call the device driver's ->pm.prepare() method. 260 dev->type->pm.suspend(dev);
261 dev->bus->pm.suspend(dev);
243 262
244 The purpose of this method is mainly to prevent new children of the 263before moving on to the next device, whereas during the resume phase the core
245 device from being registered after it has returned. It also may be used 264invokes
246 to generally prepare the device for the upcoming system transition, but
247 it should not put the device into a low power state.
248 265
249 2 class->pm.suspend(dev) is called if dev is associated with a class that 266 dev->bus->pm.resume(dev);
250 has such a method. It may invoke the device driver's ->pm.suspend() 267 dev->type->pm.resume(dev);
251 method, unless type->pm.suspend(dev) or bus->pm.suspend() does that. 268 dev->class->pm.resume(dev);
252 269
253 3 type->pm.suspend(dev) is called if dev is associated with a device type 270These callbacks may in turn invoke device- or driver-specific methods stored in
254 that has such a method. It may invoke the device driver's 271dev->driver->pm, but they don't have to.
255 ->pm.suspend() method, unless class->pm.suspend(dev) or
256 bus->pm.suspend() does that.
257 272
258 4 bus->pm.suspend(dev) is called, if implemented. It usually calls the
259 device driver's ->pm.suspend() method.
260 273
261 This call should generally quiesce the device so that it doesn't do any 274Entering System Suspend
262 I/O after the call has returned. It also may save the device registers 275-----------------------
263 and put it into the appropriate low power state, depending on the bus 276When the system goes into the standby or memory sleep state, the phases are:
264 type the device is on. 277
265 278 prepare, suspend, suspend_noirq.
266 5 bus->pm.suspend_noirq(dev) is called, if implemented. It may call the 279
267 device driver's ->pm.suspend_noirq() method, depending on the bus type 280 1. The prepare phase is meant to prevent races by preventing new devices
268 in question. 281 from being registered; the PM core would never know that all the
269 282 children of a device had been suspended if new children could be
270 This method is invoked after device interrupts have been suspended, 283 registered at will. (By contrast, devices may be unregistered at any
271 which means that the driver's interrupt handler will not be called 284 time.) Unlike the other suspend-related phases, during the prepare
272 while it is running. It should save the values of the device's 285 phase the device tree is traversed top-down.
273 registers that weren't saved previously and finally put the device into 286
274 the appropriate low power state. 287 The prepare phase uses only a bus callback. After the callback method
288 returns, no new children may be registered below the device. The method
289 may also prepare the device or driver in some way for the upcoming
290 system power transition, but it should not put the device into a
291 low-power state.
292
293 2. The suspend methods should quiesce the device to stop it from performing
294 I/O. They also may save the device registers and put it into the
295 appropriate low-power state, depending on the bus type the device is on,
296 and they may enable wakeup events.
297
298 3. The suspend_noirq phase occurs after IRQ handlers have been disabled,
299 which means that the driver's interrupt handler will not be called while
300 the callback method is running. The methods should save the values of
301 the device's registers that weren't saved previously and finally put the
302 device into the appropriate low-power state.
275 303
276 The majority of subsystems and device drivers need not implement this 304 The majority of subsystems and device drivers need not implement this
277 method. However, bus types allowing devices to share interrupt vectors, 305 callback. However, bus types allowing devices to share interrupt
278 like PCI, generally need to use it to prevent interrupt handling issues 306 vectors, like PCI, generally need it; otherwise a driver might encounter
279 from happening during suspend. 307 an error during the suspend phase by fielding a shared interrupt
280 308 generated by some other device after its own device had been set to low
281At the end of those phases, drivers should normally have stopped all I/O 309 power.
282transactions (DMA, IRQs), saved enough state that they can re-initialize 310
283or restore previous state (as needed by the hardware), and placed the 311At the end of these phases, drivers should have stopped all I/O transactions
284device into a low-power state. On many platforms they will also use 312(DMA, IRQs), saved enough state that they can re-initialize or restore previous
285gate off one or more clock sources; sometimes they will also switch off power 313state (as needed by the hardware), and placed the device into a low-power state.
286supplies, or reduce voltages. [Drivers supporting runtime PM may already have 314On many platforms they will gate off one or more clock sources; sometimes they
287performed some or all of the steps needed to prepare for the upcoming system 315will also switch off power supplies or reduce voltages. (Drivers supporting
288state transition.] 316runtime PM may already have performed some or all of these steps.)
289 317
290If device_may_wakeup(dev) returns true, the device should be prepared for 318If device_may_wakeup(dev) returns true, the device should be prepared for
291generating hardware wakeup signals when the system is in the sleep state to 319generating hardware wakeup signals to trigger a system wakeup event when the
292trigger a system wakeup event. For example, enable_irq_wake() might identify 320system is in the sleep state. For example, enable_irq_wake() might identify
293GPIO signals hooked up to a switch or other external hardware, and 321GPIO signals hooked up to a switch or other external hardware, and
294pci_enable_wake() does something similar for the PCI PME signal. 322pci_enable_wake() does something similar for the PCI PME signal.
295 323
296If a driver (or subsystem) fails it suspend method, the system won't enter the 324If any of these callbacks returns an error, the system won't enter the desired
297desired low power state; it will resume all the devices it's suspended so far. 325low-power state. Instead the PM core will unwind its actions by resuming all
298 326the devices that were suspended.
299
300Hibernation Phases
301------------------
302Hibernating the system is more complicated than putting it into the standby or
303memory sleep state, because it involves creating a system image and saving it.
304Therefore there are more phases of hibernation and special device PM methods are
305used in this case.
306
307First, it is necessary to prepare the system for creating a hibernation image.
308This is similar to putting the system into the standby or memory sleep state,
309although it generally doesn't require that devices be put into low power states
310(that is even not desirable at this point). Driver notifications are then
311issued in the following order:
312
313 1 bus->pm.prepare(dev) is called after tasks have been frozen and enough
314 memory has been freed.
315
316 2 class->pm.freeze(dev) is called if implemented. It may invoke the
317 device driver's ->pm.freeze() method, unless type->pm.freeze(dev) or
318 bus->pm.freeze() does that.
319
320 3 type->pm.freeze(dev) is called if implemented. It may invoke the device
321 driver's ->pm.suspend() method, unless class->pm.freeze(dev) or
322 bus->pm.freeze() does that.
323
324 4 bus->pm.freeze(dev) is called, if implemented. It usually calls the
325 device driver's ->pm.freeze() method.
326
327 5 bus->pm.freeze_noirq(dev) is called, if implemented. It may call the
328 device driver's ->pm.freeze_noirq() method, depending on the bus type
329 in question.
330
331The difference between ->pm.freeze() and the corresponding ->pm.suspend() (and
332similarly for the "noirq" variants) is that the former should avoid preparing
333devices to trigger system wakeup events and putting devices into low power
334states, although they generally have to save the values of device registers
335so that it's possible to restore them during system resume.
336
337Second, after the system image has been created, the functionality of devices
338has to be restored so that the image can be saved. That is similar to resuming
339devices after the system has been woken up from the standby or memory sleep
340state, which is described below, and causes the following device notifications
341to be issued:
342
343 1 bus->pm.thaw_noirq(dev), if implemented; may call the device driver's
344 ->pm.thaw_noirq() method, depending on the bus type in question.
345
346 2 bus->pm.thaw(dev), if implemented; usually calls the device driver's
347 ->pm.thaw() method.
348
349 3 type->pm.thaw(dev), if implemented; may call the device driver's
350 ->pm.thaw() method if not called by the bus type or class.
351
352 4 class->pm.thaw(dev), if implemented; may call the device driver's
353 ->pm.thaw() method if not called by the bus type or device type.
354
355 5 bus->pm.complete(dev), if implemented; may call the device driver's
356 ->pm.complete() method.
357
358Generally, the role of the ->pm.thaw() methods (including the "noirq" variants)
359is to bring the device back to the fully functional state, so that it may be
360used for saving the image, if necessary. The role of bus->pm.complete() is to
361reverse whatever bus->pm.prepare() did (likewise for the analogous device driver
362callbacks).
363
364After the image has been saved, the devices need to be prepared for putting the
365system into the low power state. That is analogous to suspending them before
366putting the system into the standby or memory sleep state and involves the
367following device notifications:
368
369 1 bus->pm.prepare(dev).
370
371 2 class->pm.poweroff(dev), if implemented; may invoke the device driver's
372 ->pm.poweroff() method if not called by the bus type or device type.
373
374 3 type->pm.poweroff(dev), if implemented; may invoke the device driver's
375 ->pm.poweroff() method if not called by the bus type or device class.
376
377 4 bus->pm.poweroff(dev), if implemented; usually calls the device driver's
378 ->pm.poweroff() method (if not called by the device class or type).
379
380 5 bus->pm.poweroff_noirq(dev), if implemented; may call the device
381 driver's ->pm.poweroff_noirq() method, depending on the bus type
382 in question.
383
384The difference between ->pm.poweroff() and the corresponding ->pm.suspend() (and
385analogously for the "noirq" variants) is that the former need not save the
386device's registers. Still, they should prepare the device for triggering
387system wakeup events if necessary and finally put it into the appropriate low
388power state.
389
390
391Device Low Power (suspend) States
392---------------------------------
393Device low-power states aren't standard. One device might only handle
394"on" and "off, while another might support a dozen different versions of
395"on" (how many engines are active?), plus a state that gets back to "on"
396faster than from a full "off".
397
398Some busses define rules about what different suspend states mean. PCI
399gives one example: after the suspend sequence completes, a non-legacy
400PCI device may not perform DMA or issue IRQs, and any wakeup events it
401issues would be issued through the PME# bus signal. Plus, there are
402several PCI-standard device states, some of which are optional.
403
404In contrast, integrated system-on-chip processors often use IRQs as the
405wakeup event sources (so drivers would call enable_irq_wake) and might
406be able to treat DMA completion as a wakeup event (sometimes DMA can stay
407active too, it'd only be the CPU and some peripherals that sleep).
408
409Some details here may be platform-specific. Systems may have devices that
410can be fully active in certain sleep states, such as an LCD display that's
411refreshed using DMA while most of the system is sleeping lightly ... and
412its frame buffer might even be updated by a DSP or other non-Linux CPU while
413the Linux control processor stays idle.
414
415Moreover, the specific actions taken may depend on the target system state.
416One target system state might allow a given device to be very operational;
417another might require a hard shut down with re-initialization on resume.
418And two different target systems might use the same device in different
419ways; the aforementioned LCD might be active in one product's "standby",
420but a different product using the same SOC might work differently.
421 327
422 328
423Resuming Devices 329Leaving System Suspend
424---------------- 330----------------------
425Resuming is done in multiple phases, much like suspending, with all 331When resuming from standby or memory sleep, the phases are:
426devices processing each phase's calls before the next phase begins.
427 332
428Again, however, different callbacks are used depending on whether the system is 333 resume_noirq, resume, complete.
429waking up from the standby or memory sleep state ("suspend-to-RAM") or from
430hibernation ("suspend-to-disk").
431 334
432If the system is waking up from the standby or memory sleep state, the phases 335 1. The resume_noirq callback methods should perform any actions needed
433are seen by driver notifications issued in this order: 336 before the driver's interrupt handlers are invoked. This generally
434 337 means undoing the actions of the suspend_noirq phase. If the bus type
435 1 bus->pm.resume_noirq(dev) is called, if implemented. It may call the 338 permits devices to share interrupt vectors, like PCI, the method should
436 device driver's ->pm.resume_noirq() method, depending on the bus type in 339 bring the device and its driver into a state in which the driver can
437 question. 340 recognize if the device is the source of incoming interrupts, if any,
438 341 and handle them correctly.
439 The role of this method is to perform actions that need to be performed
440 before device drivers' interrupt handlers are allowed to be invoked. If
441 the given bus type permits devices to share interrupt vectors, like PCI,
442 this method should bring the device and its driver into a state in which
443 the driver can recognize if the device is the source of incoming
444 interrupts, if any, and handle them correctly.
445 342
446 For example, the PCI bus type's ->pm.resume_noirq() puts the device into 343 For example, the PCI bus type's ->pm.resume_noirq() puts the device into
447 the full power state (D0 in the PCI terminology) and restores the 344 the full-power state (D0 in the PCI terminology) and restores the
448 standard configuration registers of the device. Then, it calls the 345 standard configuration registers of the device. Then it calls the
449 device driver's ->pm.resume_noirq() method to perform device-specific 346 device driver's ->pm.resume_noirq() method to perform device-specific
450 actions needed at this stage of resume. 347 actions.
451
452 2 bus->pm.resume(dev) is called, if implemented. It usually calls the
453 device driver's ->pm.resume() method.
454
455 This call should generally bring the the device back to the working
456 state, so that it can do I/O as requested after the call has returned.
457 However, it may be more convenient to use the device class or device
458 type ->pm.resume() for this purpose, in which case the bus type's
459 ->pm.resume() method need not be implemented at all.
460
461 3 type->pm.resume(dev) is called, if implemented. It may invoke the
462 device driver's ->pm.resume() method, unless class->pm.resume(dev) or
463 bus->pm.resume() does that.
464
465 For devices that are not associated with any bus type or device class
466 this method plays the role of bus->pm.resume().
467
468 4 class->pm.resume(dev) is called, if implemented. It may invoke the
469 device driver's ->pm.resume() method, unless bus->pm.resume(dev) or
470 type->pm.resume() does that.
471
472 For devices that are not associated with any bus type or device type
473 this method plays the role of bus->pm.resume().
474 348
475 5 bus->pm.complete(dev) is called, if implemented. It is supposed to 349 2. The resume methods should bring the the device back to its operating
476 invoke the device driver's ->pm.complete() method. 350 state, so that it can perform normal I/O. This generally involves
351 undoing the actions of the suspend phase.
477 352
478 The role of this method is to reverse whatever bus->pm.prepare(dev) 353 3. The complete phase uses only a bus callback. The method should undo the
479 (or the driver's ->pm.prepare()) did during suspend, if necessary. 354 actions of the prepare phase. Note, however, that new children may be
355 registered below the device as soon as the resume callbacks occur; it's
356 not necessary to wait until the complete phase.
480 357
481At the end of those phases, drivers should normally be as functional as 358At the end of these phases, drivers should be as functional as they were before
482they were before suspending: I/O can be performed using DMA and IRQs, and 359suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
483the relevant clocks are gated on. In principle the device need not be 360gated on. Even if the device was in a low-power state before the system sleep
484"fully on"; it might be in a runtime lowpower/suspend state during suspend and 361because of runtime power management, afterwards it should be back in its
485the resume callbacks may try to restore that state, but that need not be 362full-power state. There are multiple reasons why it's best to do this; they are
486desirable from the user's point of view. In fact, there are multiple reasons 363discussed in more detail in Documentation/power/runtime_pm.txt.
487why it's better to always put devices into the "fully working" state in the
488system sleep resume callbacks and they are discussed in more detail in
489Documentation/power/runtime_pm.txt.
490 364
491However, the details here may again be platform-specific. For example, 365However, the details here may again be platform-specific. For example,
492some systems support multiple "run" states, and the mode in effect at 366some systems support multiple "run" states, and the mode in effect at
@@ -502,103 +376,156 @@ the suspend was carried out, but that can't be guaranteed (in fact, it ususally
502is not the case). 376is not the case).
503 377
504Drivers must also be prepared to notice that the device has been removed 378Drivers must also be prepared to notice that the device has been removed
505while the system was powered off, whenever that's physically possible. 379while the system was powered down, whenever that's physically possible.
506PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses 380PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
507where common Linux platforms will see such removal. Details of how drivers 381where common Linux platforms will see such removal. Details of how drivers
508will notice and handle such removals are currently bus-specific, and often 382will notice and handle such removals are currently bus-specific, and often
509involve a separate thread. 383involve a separate thread.
510 384
385These callbacks may return an error value, but the PM core will ignore such
386errors since there's nothing it can do about them other than printing them in
387the system log.
511 388
512Resume From Hibernation 389
513----------------------- 390Entering Hibernation
391--------------------
392Hibernating the system is more complicated than putting it into the standby or
393memory sleep state, because it involves creating and saving a system image.
394Therefore there are more phases for hibernation, with a different set of
395callbacks. These phases always run after tasks have been frozen and memory has
396been freed.
397
398The general procedure for hibernation is to quiesce all devices (freeze), create
399an image of the system memory while everything is stable, reactivate all
400devices (thaw), write the image to permanent storage, and finally shut down the
401system (poweroff). The phases used to accomplish this are:
402
403 prepare, freeze, freeze_noirq, thaw_noirq, thaw, complete,
404 prepare, poweroff, poweroff_noirq
405
406 1. The prepare phase is discussed in the "Entering System Suspend" section
407 above.
408
409 2. The freeze methods should quiesce the device so that it doesn't generate
410 IRQs or DMA, and they may need to save the values of device registers.
411 However the device does not have to be put in a low-power state, and to
412 save time it's best not to do so. Also, the device should not be
413 prepared to generate wakeup events.
414
415 3. The freeze_noirq phase is analogous to the suspend_noirq phase discussed
416 above, except again that the device should not be put in a low-power
417 state and should not be allowed to generate wakeup events.
418
419At this point the system image is created. All devices should be inactive and
420the contents of memory should remain undisturbed while this happens, so that the
421image forms an atomic snapshot of the system state.
422
423 4. The thaw_noirq phase is analogous to the resume_noirq phase discussed
424 above. The main difference is that its methods can assume the device is
425 in the same state as at the end of the freeze_noirq phase.
426
427 5. The thaw phase is analogous to the resume phase discussed above. Its
428 methods should bring the device back to an operating state, so that it
429 can be used for saving the image if necessary.
430
431 6. The complete phase is discussed in the "Leaving System Suspend" section
432 above.
433
434At this point the system image is saved, and the devices then need to be
435prepared for the upcoming system shutdown. This is much like suspending them
436before putting the system into the standby or memory sleep state, and the phases
437are similar.
438
439 7. The prepare phase is discussed above.
440
441 8. The poweroff phase is analogous to the suspend phase.
442
443 9. The poweroff_noirq phase is analogous to the suspend_noirq phase.
444
445The poweroff and poweroff_noirq callbacks should do essentially the same things
446as the suspend and suspend_noirq callbacks. The only notable difference is that
447they need not store the device register values, because the registers should
448already have been stored during the freeze or freeze_noirq phases.
449
450
451Leaving Hibernation
452-------------------
514Resuming from hibernation is, again, more complicated than resuming from a sleep 453Resuming from hibernation is, again, more complicated than resuming from a sleep
515state in which the contents of main memory are preserved, because it requires 454state in which the contents of main memory are preserved, because it requires
516a system image to be loaded into memory and the pre-hibernation memory contents 455a system image to be loaded into memory and the pre-hibernation memory contents
517to be restored before control can be passed back to the image kernel. 456to be restored before control can be passed back to the image kernel.
518 457
519In principle, the image might be loaded into memory and the pre-hibernation 458Although in principle, the image might be loaded into memory and the
520memory contents might be restored by the boot loader. For this purpose, 459pre-hibernation memory contents restored by the boot loader, in practice this
521however, the boot loader would need to know the image kernel's entry point and 460can't be done because boot loaders aren't smart enough and there is no
522there's no protocol defined for passing that information to boot loaders. As 461established protocol for passing the necessary information. So instead, the
523a workaround, the boot loader loads a fresh instance of the kernel, called the 462boot loader loads a fresh instance of the kernel, called the boot kernel, into
524boot kernel, into memory and passes control to it in a usual way. Then, the 463memory and passes control to it in the usual way. Then the boot kernel reads
525boot kernel reads the hibernation image, restores the pre-hibernation memory 464the system image, restores the pre-hibernation memory contents, and passes
526contents and passes control to the image kernel. Thus, in fact, two different 465control to the image kernel. Thus two different kernels are involved in
527kernels are involved in resuming from hibernation and in general they are not 466resuming from hibernation. In fact, the boot kernel may be completely different
528only different because they play different roles in this operation. Actually, 467from the image kernel: a different configuration and even a different version.
529the boot kernel may be completely different from the image kernel. Not only 468This has important consequences for device drivers and their subsystems.
530the configuration of it, but also the version of it may be different. 469
531The consequences of this are important to device drivers and their subsystems 470To be able to load the system image into memory, the boot kernel needs to
532(bus types, device classes and device types) too. 471include at least a subset of device drivers allowing it to access the storage
533 472medium containing the image, although it doesn't need to include all of the
534Namely, to be able to load the hibernation image into memory, the boot kernel 473drivers present in the image kernel. After the image has been loaded, the
535needs to include at least the subset of device drivers allowing it to access the 474devices managed by the boot kernel need to be prepared for passing control back
536storage medium containing the image, although it generally doesn't need to 475to the image kernel. This is very similar to the initial steps involved in
537include all of the drivers included into the image kernel. After the image has 476creating a system image, and it is accomplished in the same way, using prepare,
538been loaded the devices handled by those drivers need to be prepared for passing 477freeze, and freeze_noirq phases. However the devices affected by these phases
539control back to the image kernel. This is very similar to the preparation of 478are only those having drivers in the boot kernel; other devices will still be in
540devices for creating a hibernation image described above. In fact, it is done 479whatever state the boot loader left them.
541in the same way, with the help of the ->pm.prepare(), ->pm.freeze() and
542->pm.freeze_noirq() callbacks, but only for device drivers included in the boot
543kernel (whose versions may generally be different from the versions of the
544analogous drivers from the image kernel).
545 480
546Should the restoration of the pre-hibernation memory contents fail, the boot 481Should the restoration of the pre-hibernation memory contents fail, the boot
547kernel would carry out the procedure of "thawing" devices described above, using 482kernel would go through the "thawing" procedure described above, using the
548the ->pm.thaw_noirq(), ->pm.thaw(), and ->pm.complete() callbacks provided by 483thaw_noirq, thaw, and complete phases, and then continue running normally. This
549subsystems and device drivers. This, however, is a very rare condition. Most 484happens only rarely. Most often the pre-hibernation memory contents are
550often the pre-hibernation memory contents are restored successfully and control 485restored successfully and control is passed to the image kernel, which then
551is passed to the image kernel that is now responsible for bringing the system 486becomes responsible for bringing the system back to the working state.
552back to the working state.
553 487
554To achieve this goal, among other things, the image kernel restores the 488To achieve this, the image kernel must restore the devices' pre-hibernation
555pre-hibernation functionality of devices. This operation is analogous to the 489functionality. The operation is much like waking up from the memory sleep
556resuming of devices after waking up from the memory sleep state, although it 490state, although it involves different phases:
557involves different device notifications which are the following:
558 491
559 1 bus->pm.restore_noirq(dev), if implemented; may call the device driver's 492 restore_noirq, restore, complete
560 ->pm.restore_noirq() method, depending on the bus type in question.
561 493
562 2 bus->pm.restore(dev), if implemented; usually calls the device driver's 494 1. The restore_noirq phase is analogous to the resume_noirq phase.
563 ->pm.restore() method.
564 495
565 3 type->pm.restore(dev), if implemented; may call the device driver's 496 2. The restore phase is analogous to the resume phase.
566 ->pm.restore() method if not called by the bus type or class.
567 497
568 4 class->pm.restore(dev), if implemented; may call the device driver's 498 3. The complete phase is discussed above.
569 ->pm.restore() method if not called by the bus type or device type.
570 499
571 5 bus->pm.complete(dev), if implemented; may call the device driver's 500The main difference from resume[_noirq] is that restore[_noirq] must assume the
572 ->pm.complete() method. 501device has been accessed and reconfigured by the boot loader or the boot kernel.
573 502Consequently the state of the device may be different from the state remembered
574The roles of the ->pm.restore_noirq() and ->pm.restore() callbacks are analogous 503from the freeze and freeze_noirq phases. The device may even need to be reset
575to the roles of the corresponding resume callbacks, but they must assume that 504and completely re-initialized. In many cases this difference doesn't matter, so
576the device may have been accessed before by the boot kernel. Consequently, the 505the resume[_noirq] and restore[_norq] method pointers can be set to the same
577state of the device before they are called may be different from the state of it 506routines. Nevertheless, different callback pointers are used in case there is a
578right prior to calling the resume callbacks. That difference usually doesn't 507situation where it actually matters.
579matter, so the majority of device drivers can set their resume and restore
580callback pointers to the same routine. Nevertheless, different callback
581pointers are used in case there is a situation where it actually matters.
582 508
583 509
584System Devices 510System Devices
585-------------- 511--------------
586System devices follow a slightly different API, which can be found in 512System devices (sysdevs) follow a slightly different API, which can be found in
587 513
588 include/linux/sysdev.h 514 include/linux/sysdev.h
589 drivers/base/sys.c 515 drivers/base/sys.c
590 516
591System devices will only be suspended with interrupts disabled, and after 517System devices will be suspended with interrupts disabled, and after all other
592all other devices have been suspended. On resume, they will be resumed 518devices have been suspended. On resume, they will be resumed before any other
593before any other devices, and also with interrupts disabled. 519devices, and also with interrupts disabled. These things occur in special
520"sysdev_driver" phases, which affect only system devices.
594 521
595That is, when the non-boot CPUs are all offline and IRQs are disabled on the 522Thus, after the suspend_noirq (or freeze_noirq or poweroff_noirq) phase, when
596remaining online CPU, then the sysdev_driver.suspend() phase is carried out, and 523the non-boot CPUs are all offline and IRQs are disabled on the remaining online
597the system enters a sleep state (or hibernation image is created). During 524CPU, then a sysdev_driver.suspend phase is carried out, and the system enters a
598resume (or after the image has been created) the sysdev_driver.resume() phase 525sleep state (or a system image is created). During resume (or after the image
599is carried out, IRQs are enabled on the only online CPU, the non-boot CPUs are 526has been created or loaded) a sysdev_driver.resume phase is carried out, IRQs
600enabled and that is followed by the "early resume" phase (in which the "noirq" 527are enabled on the only online CPU, the non-boot CPUs are enabled, and the
601callbacks provided by subsystems and device drivers are invoked). 528resume_noirq (or thaw_noirq or restore_noirq) phase begins.
602 529
603Code to actually enter and exit the system-wide low power state sometimes 530Code to actually enter and exit the system-wide low power state sometimes
604involves hardware details that are only known to the boot firmware, and 531involves hardware details that are only known to the boot firmware, and
@@ -606,18 +533,47 @@ may leave a CPU running software (from SRAM or flash memory) that monitors
606the system and manages its wakeup sequence. 533the system and manages its wakeup sequence.
607 534
608 535
536Device Low Power (suspend) States
537---------------------------------
538Device low-power states aren't standard. One device might only handle
539"on" and "off, while another might support a dozen different versions of
540"on" (how many engines are active?), plus a state that gets back to "on"
541faster than from a full "off".
542
543Some busses define rules about what different suspend states mean. PCI
544gives one example: after the suspend sequence completes, a non-legacy
545PCI device may not perform DMA or issue IRQs, and any wakeup events it
546issues would be issued through the PME# bus signal. Plus, there are
547several PCI-standard device states, some of which are optional.
548
549In contrast, integrated system-on-chip processors often use IRQs as the
550wakeup event sources (so drivers would call enable_irq_wake) and might
551be able to treat DMA completion as a wakeup event (sometimes DMA can stay
552active too, it'd only be the CPU and some peripherals that sleep).
553
554Some details here may be platform-specific. Systems may have devices that
555can be fully active in certain sleep states, such as an LCD display that's
556refreshed using DMA while most of the system is sleeping lightly ... and
557its frame buffer might even be updated by a DSP or other non-Linux CPU while
558the Linux control processor stays idle.
559
560Moreover, the specific actions taken may depend on the target system state.
561One target system state might allow a given device to be very operational;
562another might require a hard shut down with re-initialization on resume.
563And two different target systems might use the same device in different
564ways; the aforementioned LCD might be active in one product's "standby",
565but a different product using the same SOC might work differently.
566
567
609Power Management Notifiers 568Power Management Notifiers
610-------------------------- 569--------------------------
611As stated in Documentation/power/notifiers.txt, there are some operations that 570There are some operations that cannot be carried out by the power management
612cannot be carried out by the power management callbacks discussed above, because 571callbacks discussed above, because the callbacks occur too late or too early.
613carrying them out at these points would be too late or too early. To handle 572To handle these cases, subsystems and device drivers may register power
614these cases subsystems and device drivers may register power management 573management notifiers that are called before tasks are frozen and after they have
615notifiers that are called before tasks are frozen and after they have been 574been thawed. Generally speaking, the PM notifiers are suitable for performing
616thawed. 575actions that either require user space to be available, or at least won't
617 576interfere with user space.
618Generally speaking, the PM notifiers are suitable for performing actions that
619either require user space to be available, or at least won't interfere with user
620space in a wrong way.
621 577
622For details refer to Documentation/power/notifiers.txt. 578For details refer to Documentation/power/notifiers.txt.
623 579
@@ -629,24 +585,23 @@ running. This feature is useful for devices that are not being used, and
629can offer significant power savings on a running system. These devices 585can offer significant power savings on a running system. These devices
630often support a range of runtime power states, which might use names such 586often support a range of runtime power states, which might use names such
631as "off", "sleep", "idle", "active", and so on. Those states will in some 587as "off", "sleep", "idle", "active", and so on. Those states will in some
632cases (like PCI) be partially constrained by a bus the device uses, and will 588cases (like PCI) be partially constrained by the bus the device uses, and will
633usually include hardware states that are also used in system sleep states. 589usually include hardware states that are also used in system sleep states.
634 590
635Note, however, that a system-wide power transition can be started while some 591A system-wide power transition can be started while some devices are in low
636devices are in low power states due to the runtime power management. The system 592power states due to runtime power management. The system sleep PM callbacks
637sleep PM callbacks should generally recognize such situations and react to them 593should recognize such situations and react to them appropriately, but the
638appropriately, but the recommended actions to be taken in that cases are 594necessary actions are subsystem-specific.
639subsystem-specific. 595
640 596In some cases the decision may be made at the subsystem level while in other
641In some cases the decision may be made at the subsystem level while in some 597cases the device driver may be left to decide. In some cases it may be
642other cases the device driver may be left to decide. In some cases it may be 598desirable to leave a suspended device in that state during a system-wide power
643desirable to leave a suspended device in that state during system-wide power 599transition, but in other cases the device must be put back into the full-power
644transition, but in some other cases the device ought to be put back into the 600state temporarily, for example so that its system wakeup capability can be
645full power state, for example to be configured for system wakeup or so that its 601disabled. This all depends on the hardware and the design of the subsystem and
646system wakeup capability can be disabled. That all depends on the hardware 602device driver in question.
647and the design of the subsystem and device driver in question. 603
648 604During system-wide resume from a sleep state it's best to put devices into the
649During system-wide resume from a sleep state it's better to put devices into 605full-power state, as explained in Documentation/power/runtime_pm.txt. Refer to
650the full power state, as explained in Documentation/power/runtime_pm.txt. Refer 606that document for more information regarding this particular issue as well as
651to that document for more information regarding this particular issue as well as
652for information on the device runtime power management framework in general. 607for information on the device runtime power management framework in general.