diff options
Diffstat (limited to 'Documentation/power/basic-pm-debugging.txt')
-rw-r--r-- | Documentation/power/basic-pm-debugging.txt | 216 |
1 files changed, 157 insertions, 59 deletions
diff --git a/Documentation/power/basic-pm-debugging.txt b/Documentation/power/basic-pm-debugging.txt index 57aef2f6e0de..1555001bc733 100644 --- a/Documentation/power/basic-pm-debugging.txt +++ b/Documentation/power/basic-pm-debugging.txt | |||
@@ -1,45 +1,111 @@ | |||
1 | Debugging suspend and resume | 1 | Debugging hibernation and suspend |
2 | (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL | 2 | (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL |
3 | 3 | ||
4 | 1. Testing suspend to disk (STD) | 4 | 1. Testing hibernation (aka suspend to disk or STD) |
5 | 5 | ||
6 | To verify that the STD works, you can try to suspend in the "reboot" mode: | 6 | To check if hibernation works, you can try to hibernate in the "reboot" mode: |
7 | 7 | ||
8 | # echo reboot > /sys/power/disk | 8 | # echo reboot > /sys/power/disk |
9 | # echo disk > /sys/power/state | 9 | # echo disk > /sys/power/state |
10 | 10 | ||
11 | and the system should suspend, reboot, resume and get back to the command prompt | 11 | and the system should create a hibernation image, reboot, resume and get back to |
12 | where you have started the transition. If that happens, the STD is most likely | 12 | the command prompt where you have started the transition. If that happens, |
13 | to work correctly, but you need to repeat the test at least a couple of times in | 13 | hibernation is most likely to work correctly. Still, you need to repeat the |
14 | a row for confidence. This is necessary, because some problems only show up on | 14 | test at least a couple of times in a row for confidence. [This is necessary, |
15 | a second attempt at suspending and resuming the system. You should also test | 15 | because some problems only show up on a second attempt at suspending and |
16 | the "platform" and "shutdown" modes of suspend: | 16 | resuming the system.] Moreover, hibernating in the "reboot" and "shutdown" |
17 | modes causes the PM core to skip some platform-related callbacks which on ACPI | ||
18 | systems might be necessary to make hibernation work. Thus, if you machine fails | ||
19 | to hibernate or resume in the "reboot" mode, you should try the "platform" mode: | ||
17 | 20 | ||
18 | # echo platform > /sys/power/disk | 21 | # echo platform > /sys/power/disk |
19 | # echo disk > /sys/power/state | 22 | # echo disk > /sys/power/state |
20 | 23 | ||
21 | or | 24 | which is the default and recommended mode of hibernation. |
25 | |||
26 | Unfortunately, the "platform" mode of hibernation does not work on some systems | ||
27 | with broken BIOSes. In such cases the "shutdown" mode of hibernation might | ||
28 | work: | ||
22 | 29 | ||
23 | # echo shutdown > /sys/power/disk | 30 | # echo shutdown > /sys/power/disk |
24 | # echo disk > /sys/power/state | 31 | # echo disk > /sys/power/state |
25 | 32 | ||
26 | in which cases you will have to press the power button to make the system | 33 | (it is similar to the "reboot" mode, but it requires you to press the power |
27 | resume. If that does not work, you will need to identify what goes wrong. | 34 | button to make the system resume). |
35 | |||
36 | If neither "platform" nor "shutdown" hibernation mode works, you will need to | ||
37 | identify what goes wrong. | ||
38 | |||
39 | a) Test modes of hibernation | ||
40 | |||
41 | To find out why hibernation fails on your system, you can use a special testing | ||
42 | facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then, | ||
43 | there is the file /sys/power/pm_test that can be used to make the hibernation | ||
44 | core run in a test mode. There are 5 test modes available: | ||
45 | |||
46 | freezer | ||
47 | - test the freezing of processes | ||
48 | |||
49 | devices | ||
50 | - test the freezing of processes and suspending of devices | ||
28 | 51 | ||
29 | a) Test mode of STD | 52 | platform |
53 | - test the freezing of processes, suspending of devices and platform | ||
54 | global control methods(*) | ||
30 | 55 | ||
31 | To verify if there are any drivers that cause problems you can run the STD | 56 | processors |
32 | in the test mode: | 57 | - test the freezing of processes, suspending of devices, platform |
58 | global control methods(*) and the disabling of nonboot CPUs | ||
33 | 59 | ||
34 | # echo test > /sys/power/disk | 60 | core |
61 | - test the freezing of processes, suspending of devices, platform global | ||
62 | control methods(*), the disabling of nonboot CPUs and suspending of | ||
63 | platform/system devices | ||
64 | |||
65 | (*) the platform global control methods are only available on ACPI systems | ||
66 | and are only tested if the hibernation mode is set to "platform" | ||
67 | |||
68 | To use one of them it is necessary to write the corresponding string to | ||
69 | /sys/power/pm_test (eg. "devices" to test the freezing of processes and | ||
70 | suspending devices) and issue the standard hibernation commands. For example, | ||
71 | to use the "devices" test mode along with the "platform" mode of hibernation, | ||
72 | you should do the following: | ||
73 | |||
74 | # echo devices > /sys/power/pm_test | ||
75 | # echo platform > /sys/power/disk | ||
35 | # echo disk > /sys/power/state | 76 | # echo disk > /sys/power/state |
36 | 77 | ||
37 | in which case the system should freeze tasks, suspend devices, disable nonboot | 78 | Then, the kernel will try to freeze processes, suspend devices, wait 5 seconds, |
38 | CPUs (if any), wait for 5 seconds, enable nonboot CPUs, resume devices, thaw | 79 | resume devices and thaw processes. If "platform" is written to |
39 | tasks and return to your command prompt. If that fails, most likely there is | 80 | /sys/power/pm_test , then after suspending devices the kernel will additionally |
40 | a driver that fails to either suspend or resume (in the latter case the system | 81 | invoke the global control methods (eg. ACPI global control methods) used to |
41 | may hang or be unstable after the test, so please take that into consideration). | 82 | prepare the platform firmware for hibernation. Next, it will wait 5 seconds and |
42 | To find this driver, you can carry out a binary search according to the rules: | 83 | invoke the platform (eg. ACPI) global methods used to cancel hibernation etc. |
84 | |||
85 | Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal | ||
86 | hibernation/suspend operations. Also, when open for reading, /sys/power/pm_test | ||
87 | contains a space-separated list of all available tests (including "none" that | ||
88 | represents the normal functionality) in which the current test level is | ||
89 | indicated by square brackets. | ||
90 | |||
91 | Generally, as you can see, each test level is more "invasive" than the previous | ||
92 | one and the "core" level tests the hardware and drivers as deeply as possible | ||
93 | without creating a hibernation image. Obviously, if the "devices" test fails, | ||
94 | the "platform" test will fail as well and so on. Thus, as a rule of thumb, you | ||
95 | should try the test modes starting from "freezer", through "devices", "platform" | ||
96 | and "processors" up to "core" (repeat the test on each level a couple of times | ||
97 | to make sure that any random factors are avoided). | ||
98 | |||
99 | If the "freezer" test fails, there is a task that cannot be frozen (in that case | ||
100 | it usually is possible to identify the offending task by analysing the output of | ||
101 | dmesg obtained after the failing test). Failure at this level usually means | ||
102 | that there is a problem with the tasks freezer subsystem that should be | ||
103 | reported. | ||
104 | |||
105 | If the "devices" test fails, most likely there is a driver that cannot suspend | ||
106 | or resume its device (in the latter case the system may hang or become unstable | ||
107 | after the test, so please take that into consideration). To find this driver, | ||
108 | you can carry out a binary search according to the rules: | ||
43 | - if the test fails, unload a half of the drivers currently loaded and repeat | 109 | - if the test fails, unload a half of the drivers currently loaded and repeat |
44 | (that would probably involve rebooting the system, so always note what drivers | 110 | (that would probably involve rebooting the system, so always note what drivers |
45 | have been loaded before the test), | 111 | have been loaded before the test), |
@@ -47,23 +113,46 @@ have been loaded before the test), | |||
47 | recently and repeat. | 113 | recently and repeat. |
48 | 114 | ||
49 | Once you have found the failing driver (there can be more than just one of | 115 | Once you have found the failing driver (there can be more than just one of |
50 | them), you have to unload it every time before the STD transition. In that case | 116 | them), you have to unload it every time before hibernation. In that case please |
51 | please make sure to report the problem with the driver. | 117 | make sure to report the problem with the driver. |
52 | 118 | ||
53 | It is also possible that a cycle can still fail after you have unloaded | 119 | It is also possible that the "devices" test will still fail after you have |
54 | all modules. In that case, you would want to look in your kernel configuration | 120 | unloaded all modules. In that case, you may want to look in your kernel |
55 | for the drivers that can be compiled as modules (testing again with them as | 121 | configuration for the drivers that can be compiled as modules (and test again |
56 | modules), and possibly also try boot time options such as "noapic" or "noacpi". | 122 | with these drivers compiled as modules). You may also try to use some special |
123 | kernel command line options such as "noapic", "noacpi" or even "acpi=off". | ||
124 | |||
125 | If the "platform" test fails, there is a problem with the handling of the | ||
126 | platform (eg. ACPI) firmware on your system. In that case the "platform" mode | ||
127 | of hibernation is not likely to work. You can try the "shutdown" mode, but that | ||
128 | is rather a poor man's workaround. | ||
129 | |||
130 | If the "processors" test fails, the disabling/enabling of nonboot CPUs does not | ||
131 | work (of course, this only may be an issue on SMP systems) and the problem | ||
132 | should be reported. In that case you can also try to switch the nonboot CPUs | ||
133 | off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and | ||
134 | see if that works. | ||
135 | |||
136 | If the "core" test fails, which means that suspending of the system/platform | ||
137 | devices has failed (these devices are suspended on one CPU with interrupts off), | ||
138 | the problem is most probably hardware-related and serious, so it should be | ||
139 | reported. | ||
140 | |||
141 | A failure of any of the "platform", "processors" or "core" tests may cause your | ||
142 | system to hang or become unstable, so please beware. Such a failure usually | ||
143 | indicates a serious problem that very well may be related to the hardware, but | ||
144 | please report it anyway. | ||
57 | 145 | ||
58 | b) Testing minimal configuration | 146 | b) Testing minimal configuration |
59 | 147 | ||
60 | If the test mode of STD works, you can boot the system with "init=/bin/bash" | 148 | If all of the hibernation test modes work, you can boot the system with the |
61 | and attempt to suspend in the "reboot", "shutdown" and "platform" modes. If | 149 | "init=/bin/bash" command line parameter and attempt to hibernate in the |
62 | that does not work, there probably is a problem with a driver statically | 150 | "reboot", "shutdown" and "platform" modes. If that does not work, there |
63 | compiled into the kernel and you can try to compile more drivers as modules, | 151 | probably is a problem with a driver statically compiled into the kernel and you |
64 | so that they can be tested individually. Otherwise, there is a problem with a | 152 | can try to compile more drivers as modules, so that they can be tested |
65 | modular driver and you can find it by loading a half of the modules you normally | 153 | individually. Otherwise, there is a problem with a modular driver and you can |
66 | use and binary searching in accordance with the algorithm: | 154 | find it by loading a half of the modules you normally use and binary searching |
155 | in accordance with the algorithm: | ||
67 | - if there are n modules loaded and the attempt to suspend and resume fails, | 156 | - if there are n modules loaded and the attempt to suspend and resume fails, |
68 | unload n/2 of the modules and try again (that would probably involve rebooting | 157 | unload n/2 of the modules and try again (that would probably involve rebooting |
69 | the system), | 158 | the system), |
@@ -71,19 +160,19 @@ the system), | |||
71 | load n/2 modules more and try again. | 160 | load n/2 modules more and try again. |
72 | 161 | ||
73 | Again, if you find the offending module(s), it(they) must be unloaded every time | 162 | Again, if you find the offending module(s), it(they) must be unloaded every time |
74 | before the STD transition, and please report the problem with it(them). | 163 | before hibernation, and please report the problem with it(them). |
75 | 164 | ||
76 | c) Advanced debugging | 165 | c) Advanced debugging |
77 | 166 | ||
78 | In case the STD does not work on your system even in the minimal configuration | 167 | In case that hibernation does not work on your system even in the minimal |
79 | and compiling more drivers as modules is not practical or some modules cannot | 168 | configuration and compiling more drivers as modules is not practical or some |
80 | be unloaded, you can use one of the more advanced debugging techniques to find | 169 | modules cannot be unloaded, you can use one of the more advanced debugging |
81 | the problem. First, if there is a serial port in your box, you can boot the | 170 | techniques to find the problem. First, if there is a serial port in your box, |
82 | kernel with the 'no_console_suspend' parameter and try to log kernel | 171 | you can boot the kernel with the 'no_console_suspend' parameter and try to log |
83 | messages using the serial console. This may provide you with some information | 172 | kernel messages using the serial console. This may provide you with some |
84 | about the reasons of the suspend (resume) failure. Alternatively, it may be | 173 | information about the reasons of the suspend (resume) failure. Alternatively, |
85 | possible to use a FireWire port for debugging with firescope | 174 | it may be possible to use a FireWire port for debugging with firescope |
86 | (ftp://ftp.firstfloor.org/pub/ak/firescope/). On i386 it is also possible to | 175 | (ftp://ftp.firstfloor.org/pub/ak/firescope/). On x86 it is also possible to |
87 | use the PM_TRACE mechanism documented in Documentation/s2ram.txt . | 176 | use the PM_TRACE mechanism documented in Documentation/s2ram.txt . |
88 | 177 | ||
89 | 2. Testing suspend to RAM (STR) | 178 | 2. Testing suspend to RAM (STR) |
@@ -91,16 +180,25 @@ use the PM_TRACE mechanism documented in Documentation/s2ram.txt . | |||
91 | To verify that the STR works, it is generally more convenient to use the s2ram | 180 | To verify that the STR works, it is generally more convenient to use the s2ram |
92 | tool available from http://suspend.sf.net and documented at | 181 | tool available from http://suspend.sf.net and documented at |
93 | http://en.opensuse.org/s2ram . However, before doing that it is recommended to | 182 | http://en.opensuse.org/s2ram . However, before doing that it is recommended to |
94 | carry out the procedure described in section 1. | 183 | carry out STR testing using the facility described in section 1. |
95 | 184 | ||
96 | Assume you have resolved the problems with the STD and you have found some | 185 | Namely, after writing "freezer", "devices", "platform", "processors", or "core" |
97 | failing drivers. These drivers are also likely to fail during the STR or | 186 | into /sys/power/pm_test (available if the kernel is compiled with |
98 | during the resume, so it is better to unload them every time before the STR | 187 | CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding |
99 | transition. Now, you can follow the instructions at | 188 | to given string. The STR test modes are defined in the same way as for |
100 | http://en.opensuse.org/s2ram to test the system, but if it does not work | 189 | hibernation, so please refer to Section 1 for more information about them. In |
101 | "out of the box", you may need to boot it with "init=/bin/bash" and test | 190 | particular, the "core" test allows you to test everything except for the actual |
102 | s2ram in the minimal configuration. In that case, you may be able to search | 191 | invocation of the platform firmware in order to put the system into the sleep |
103 | for failing drivers by following the procedure analogous to the one described in | 192 | state. |
104 | 1b). If you find some failing drivers, you will have to unload them every time | 193 | |
105 | before the STR transition (ie. before you run s2ram), and please report the | 194 | Among other things, the testing with the help of /sys/power/pm_test may allow |
106 | problems with them. | 195 | you to identify drivers that fail to suspend or resume their devices. They |
196 | should be unloaded every time before an STR transition. | ||
197 | |||
198 | Next, you can follow the instructions at http://en.opensuse.org/s2ram to test | ||
199 | the system, but if it does not work "out of the box", you may need to boot it | ||
200 | with "init=/bin/bash" and test s2ram in the minimal configuration. In that | ||
201 | case, you may be able to search for failing drivers by following the procedure | ||
202 | analogous to the one described in section 1. If you find some failing drivers, | ||
203 | you will have to unload them every time before an STR transition (ie. before | ||
204 | you run s2ram), and please report the problems with them. | ||