diff options
Diffstat (limited to 'Documentation')
24 files changed, 1383 insertions, 368 deletions
diff --git a/Documentation/ABI/obsolete/devfs b/Documentation/ABI/removed/devfs index b8b87399bc8f..8195c4e0d0a1 100644 --- a/Documentation/ABI/obsolete/devfs +++ b/Documentation/ABI/removed/devfs | |||
@@ -1,13 +1,12 @@ | |||
1 | What: devfs | 1 | What: devfs |
2 | Date: July 2005 | 2 | Date: July 2005 (scheduled), finally removed in kernel v2.6.18 |
3 | Contact: Greg Kroah-Hartman <gregkh@suse.de> | 3 | Contact: Greg Kroah-Hartman <gregkh@suse.de> |
4 | Description: | 4 | Description: |
5 | devfs has been unmaintained for a number of years, has unfixable | 5 | devfs has been unmaintained for a number of years, has unfixable |
6 | races, contains a naming policy within the kernel that is | 6 | races, contains a naming policy within the kernel that is |
7 | against the LSB, and can be replaced by using udev. | 7 | against the LSB, and can be replaced by using udev. |
8 | The files fs/devfs/*, include/linux/devfs_fs*.h will be removed, | 8 | The files fs/devfs/*, include/linux/devfs_fs*.h were removed, |
9 | along with the the assorted devfs function calls throughout the | 9 | along with the the assorted devfs function calls throughout the |
10 | kernel tree. | 10 | kernel tree. |
11 | 11 | ||
12 | Users: | 12 | Users: |
13 | |||
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power new file mode 100644 index 000000000000..d882f8093871 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-power | |||
@@ -0,0 +1,88 @@ | |||
1 | What: /sys/power/ | ||
2 | Date: August 2006 | ||
3 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
4 | Description: | ||
5 | The /sys/power directory will contain files that will | ||
6 | provide a unified interface to the power management | ||
7 | subsystem. | ||
8 | |||
9 | What: /sys/power/state | ||
10 | Date: August 2006 | ||
11 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
12 | Description: | ||
13 | The /sys/power/state file controls the system power state. | ||
14 | Reading from this file returns what states are supported, | ||
15 | which is hard-coded to 'standby' (Power-On Suspend), 'mem' | ||
16 | (Suspend-to-RAM), and 'disk' (Suspend-to-Disk). | ||
17 | |||
18 | Writing to this file one of these strings causes the system to | ||
19 | transition into that state. Please see the file | ||
20 | Documentation/power/states.txt for a description of each of | ||
21 | these states. | ||
22 | |||
23 | What: /sys/power/disk | ||
24 | Date: August 2006 | ||
25 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
26 | Description: | ||
27 | The /sys/power/disk file controls the operating mode of the | ||
28 | suspend-to-disk mechanism. Reading from this file returns | ||
29 | the name of the method by which the system will be put to | ||
30 | sleep on the next suspend. There are four methods supported: | ||
31 | 'firmware' - means that the memory image will be saved to disk | ||
32 | by some firmware, in which case we also assume that the | ||
33 | firmware will handle the system suspend. | ||
34 | 'platform' - the memory image will be saved by the kernel and | ||
35 | the system will be put to sleep by the platform driver (e.g. | ||
36 | ACPI or other PM registers). | ||
37 | 'shutdown' - the memory image will be saved by the kernel and | ||
38 | the system will be powered off. | ||
39 | 'reboot' - the memory image will be saved by the kernel and | ||
40 | the system will be rebooted. | ||
41 | |||
42 | The suspend-to-disk method may be chosen by writing to this | ||
43 | file one of the accepted strings: | ||
44 | |||
45 | 'firmware' | ||
46 | 'platform' | ||
47 | 'shutdown' | ||
48 | 'reboot' | ||
49 | |||
50 | It will only change to 'firmware' or 'platform' if the system | ||
51 | supports that. | ||
52 | |||
53 | What: /sys/power/image_size | ||
54 | Date: August 2006 | ||
55 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
56 | Description: | ||
57 | The /sys/power/image_size file controls the size of the image | ||
58 | created by the suspend-to-disk mechanism. It can be written a | ||
59 | string representing a non-negative integer that will be used | ||
60 | as an upper limit of the image size, in bytes. The kernel's | ||
61 | suspend-to-disk code will do its best to ensure the image size | ||
62 | will not exceed this number. However, if it turns out to be | ||
63 | impossible, the kernel will try to suspend anyway using the | ||
64 | smallest image possible. In particular, if "0" is written to | ||
65 | this file, the suspend image will be as small as possible. | ||
66 | |||
67 | Reading from this file will display the current image size | ||
68 | limit, which is set to 500 MB by default. | ||
69 | |||
70 | What: /sys/power/pm_trace | ||
71 | Date: August 2006 | ||
72 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
73 | Description: | ||
74 | The /sys/power/pm_trace file controls the code which saves the | ||
75 | last PM event point in the RTC across reboots, so that you can | ||
76 | debug a machine that just hangs during suspend (or more | ||
77 | commonly, during resume). Namely, the RTC is only used to save | ||
78 | the last PM event point if this file contains '1'. Initially | ||
79 | it contains '0' which may be changed to '1' by writing a | ||
80 | string representing a nonzero integer into it. | ||
81 | |||
82 | To use this debugging feature you should attempt to suspend | ||
83 | the machine, then reboot it and run | ||
84 | |||
85 | dmesg -s 1000000 | grep 'hash matches' | ||
86 | |||
87 | CAUTION: Using it will cause your machine's real-time (CMOS) | ||
88 | clock to be set to a random invalid time after a resume. | ||
diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl index 320af25de3a2..3608472d7b74 100644 --- a/Documentation/DocBook/usb.tmpl +++ b/Documentation/DocBook/usb.tmpl | |||
@@ -43,59 +43,52 @@ | |||
43 | 43 | ||
44 | <para>A Universal Serial Bus (USB) is used to connect a host, | 44 | <para>A Universal Serial Bus (USB) is used to connect a host, |
45 | such as a PC or workstation, to a number of peripheral | 45 | such as a PC or workstation, to a number of peripheral |
46 | devices. USB uses a tree structure, with the host at the | 46 | devices. USB uses a tree structure, with the host as the |
47 | root (the system's master), hubs as interior nodes, and | 47 | root (the system's master), hubs as interior nodes, and |
48 | peripheral devices as leaves (and slaves). | 48 | peripherals as leaves (and slaves). |
49 | Modern PCs support several such trees of USB devices, usually | 49 | Modern PCs support several such trees of USB devices, usually |
50 | one USB 2.0 tree (480 Mbit/sec each) with | 50 | one USB 2.0 tree (480 Mbit/sec each) with |
51 | a few USB 1.1 trees (12 Mbit/sec each) that are used when you | 51 | a few USB 1.1 trees (12 Mbit/sec each) that are used when you |
52 | connect a USB 1.1 device directly to the machine's "root hub". | 52 | connect a USB 1.1 device directly to the machine's "root hub". |
53 | </para> | 53 | </para> |
54 | 54 | ||
55 | <para>That master/slave asymmetry was designed in part for | 55 | <para>That master/slave asymmetry was designed-in for a number of |
56 | ease of use. It is not physically possible to assemble | 56 | reasons, one being ease of use. It is not physically possible to |
57 | (legal) USB cables incorrectly: all upstream "to-the-host" | 57 | assemble (legal) USB cables incorrectly: all upstream "to the host" |
58 | connectors are the rectangular type, matching the sockets on | 58 | connectors are the rectangular type (matching the sockets on |
59 | root hubs, and the downstream type are the squarish type | 59 | root hubs), and all downstream connectors are the squarish type |
60 | (or they are built in to the peripheral). | 60 | (or they are built into the peripheral). |
61 | Software doesn't need to deal with distributed autoconfiguration | 61 | Also, the host software doesn't need to deal with distributed |
62 | since the pre-designated master node manages all that. | 62 | auto-configuration since the pre-designated master node manages all that. |
63 | At the electrical level, bus protocol overhead is reduced by | 63 | And finally, at the electrical level, bus protocol overhead is reduced by |
64 | eliminating arbitration and moving scheduling into host software. | 64 | eliminating arbitration and moving scheduling into the host software. |
65 | </para> | 65 | </para> |
66 | 66 | ||
67 | <para>USB 1.0 was announced in January 1996, and was revised | 67 | <para>USB 1.0 was announced in January 1996 and was revised |
68 | as USB 1.1 (with improvements in hub specification and | 68 | as USB 1.1 (with improvements in hub specification and |
69 | support for interrupt-out transfers) in September 1998. | 69 | support for interrupt-out transfers) in September 1998. |
70 | USB 2.0 was released in April 2000, including high speed | 70 | USB 2.0 was released in April 2000, adding high-speed |
71 | transfers and transaction translating hubs (used for USB 1.1 | 71 | transfers and transaction-translating hubs (used for USB 1.1 |
72 | and 1.0 backward compatibility). | 72 | and 1.0 backward compatibility). |
73 | </para> | 73 | </para> |
74 | 74 | ||
75 | <para>USB support was added to Linux early in the 2.2 kernel series | 75 | <para>Kernel developers added USB support to Linux early in the 2.2 kernel |
76 | shortly before the 2.3 development forked off. Updates | 76 | series, shortly before 2.3 development forked. Updates from 2.3 were |
77 | from 2.3 were regularly folded back into 2.2 releases, bringing | 77 | regularly folded back into 2.2 releases, which improved reliability and |
78 | new features such as <filename>/sbin/hotplug</filename> support, | 78 | brought <filename>/sbin/hotplug</filename> support as well more drivers. |
79 | more drivers, and more robustness. | 79 | Such improvements were continued in the 2.5 kernel series, where they added |
80 | The 2.5 kernel series continued such improvements, and also | 80 | USB 2.0 support, improved performance, and made the host controller drivers |
81 | worked on USB 2.0 support, | 81 | (HCDs) more consistent. They also simplified the API (to make bugs less |
82 | higher performance, | 82 | likely) and added internal "kerneldoc" documentation. |
83 | better consistency between host controller drivers, | ||
84 | API simplification (to make bugs less likely), | ||
85 | and providing internal "kerneldoc" documentation. | ||
86 | </para> | 83 | </para> |
87 | 84 | ||
88 | <para>Linux can run inside USB devices as well as on | 85 | <para>Linux can run inside USB devices as well as on |
89 | the hosts that control the devices. | 86 | the hosts that control the devices. |
90 | Because the Linux 2.x USB support evolved to support mass market | 87 | But USB device drivers running inside those peripherals |
91 | platforms such as Apple Macintosh or PC-compatible systems, | ||
92 | it didn't address design concerns for those types of USB systems. | ||
93 | So it can't be used inside mass-market PDAs, or other peripherals. | ||
94 | USB device drivers running inside those Linux peripherals | ||
95 | don't do the same things as the ones running inside hosts, | 88 | don't do the same things as the ones running inside hosts, |
96 | and so they've been given a different name: | 89 | so they've been given a different name: |
97 | they're called <emphasis>gadget drivers</emphasis>. | 90 | <emphasis>gadget drivers</emphasis>. |
98 | This document does not present gadget drivers. | 91 | This document does not cover gadget drivers. |
99 | </para> | 92 | </para> |
100 | 93 | ||
101 | </chapter> | 94 | </chapter> |
@@ -103,17 +96,14 @@ | |||
103 | <chapter id="host"> | 96 | <chapter id="host"> |
104 | <title>USB Host-Side API Model</title> | 97 | <title>USB Host-Side API Model</title> |
105 | 98 | ||
106 | <para>Within the kernel, | 99 | <para>Host-side drivers for USB devices talk to the "usbcore" APIs. |
107 | host-side drivers for USB devices talk to the "usbcore" APIs. | 100 | There are two. One is intended for |
108 | There are two types of public "usbcore" APIs, targetted at two different | 101 | <emphasis>general-purpose</emphasis> drivers (exposed through |
109 | layers of USB driver. Those are | 102 | driver frameworks), and the other is for drivers that are |
110 | <emphasis>general purpose</emphasis> drivers, exposed through | 103 | <emphasis>part of the core</emphasis>. |
111 | driver frameworks such as block, character, or network devices; | 104 | Such core drivers include the <emphasis>hub</emphasis> driver |
112 | and drivers that are <emphasis>part of the core</emphasis>, | 105 | (which manages trees of USB devices) and several different kinds |
113 | which are involved in managing a USB bus. | 106 | of <emphasis>host controller drivers</emphasis>, |
114 | Such core drivers include the <emphasis>hub</emphasis> driver, | ||
115 | which manages trees of USB devices, and several different kinds | ||
116 | of <emphasis>host controller driver (HCD)</emphasis>, | ||
117 | which control individual busses. | 107 | which control individual busses. |
118 | </para> | 108 | </para> |
119 | 109 | ||
@@ -122,21 +112,21 @@ | |||
122 | 112 | ||
123 | <itemizedlist> | 113 | <itemizedlist> |
124 | 114 | ||
125 | <listitem><para>USB supports four kinds of data transfer | 115 | <listitem><para>USB supports four kinds of data transfers |
126 | (control, bulk, interrupt, and isochronous). Two transfer | 116 | (control, bulk, interrupt, and isochronous). Two of them (control |
127 | types use bandwidth as it's available (control and bulk), | 117 | and bulk) use bandwidth as it's available, |
128 | while the other two types of transfer (interrupt and isochronous) | 118 | while the other two (interrupt and isochronous) |
129 | are scheduled to provide guaranteed bandwidth. | 119 | are scheduled to provide guaranteed bandwidth. |
130 | </para></listitem> | 120 | </para></listitem> |
131 | 121 | ||
132 | <listitem><para>The device description model includes one or more | 122 | <listitem><para>The device description model includes one or more |
133 | "configurations" per device, only one of which is active at a time. | 123 | "configurations" per device, only one of which is active at a time. |
134 | Devices that are capable of high speed operation must also support | 124 | Devices that are capable of high-speed operation must also support |
135 | full speed configurations, along with a way to ask about the | 125 | full-speed configurations, along with a way to ask about the |
136 | "other speed" configurations that might be used. | 126 | "other speed" configurations which might be used. |
137 | </para></listitem> | 127 | </para></listitem> |
138 | 128 | ||
139 | <listitem><para>Configurations have one or more "interface", each | 129 | <listitem><para>Configurations have one or more "interfaces", each |
140 | of which may have "alternate settings". Interfaces may be | 130 | of which may have "alternate settings". Interfaces may be |
141 | standardized by USB "Class" specifications, or may be specific to | 131 | standardized by USB "Class" specifications, or may be specific to |
142 | a vendor or device.</para> | 132 | a vendor or device.</para> |
@@ -162,7 +152,7 @@ | |||
162 | </para></listitem> | 152 | </para></listitem> |
163 | 153 | ||
164 | <listitem><para>The Linux USB API supports synchronous calls for | 154 | <listitem><para>The Linux USB API supports synchronous calls for |
165 | control and bulk messaging. | 155 | control and bulk messages. |
166 | It also supports asynchnous calls for all kinds of data transfer, | 156 | It also supports asynchnous calls for all kinds of data transfer, |
167 | using request structures called "URBs" (USB Request Blocks). | 157 | using request structures called "URBs" (USB Request Blocks). |
168 | </para></listitem> | 158 | </para></listitem> |
@@ -463,14 +453,25 @@ | |||
463 | file in your Linux kernel sources. | 453 | file in your Linux kernel sources. |
464 | </para> | 454 | </para> |
465 | 455 | ||
466 | <para>Otherwise the main use for this file from programs | 456 | <para>This file, in combination with the poll() system call, can |
467 | is to poll() it to get notifications of usb devices | 457 | also be used to detect when devices are added or removed: |
468 | as they're plugged or unplugged. | 458 | <programlisting>int fd; |
469 | To see what changed, you'd need to read the file and | 459 | struct pollfd pfd; |
470 | compare "before" and "after" contents, scan the filesystem, | 460 | |
471 | or see its hotplug event. | 461 | fd = open("/proc/bus/usb/devices", O_RDONLY); |
462 | pfd = { fd, POLLIN, 0 }; | ||
463 | for (;;) { | ||
464 | /* The first time through, this call will return immediately. */ | ||
465 | poll(&pfd, 1, -1); | ||
466 | |||
467 | /* To see what's changed, compare the file's previous and current | ||
468 | contents or scan the filesystem. (Scanning is more precise.) */ | ||
469 | }</programlisting> | ||
470 | Note that this behavior is intended to be used for informational | ||
471 | and debug purposes. It would be more appropriate to use programs | ||
472 | such as udev or HAL to initialize a device or start a user-mode | ||
473 | helper program, for instance. | ||
472 | </para> | 474 | </para> |
473 | |||
474 | </sect1> | 475 | </sect1> |
475 | 476 | ||
476 | <sect1> | 477 | <sect1> |
diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 915ae8c986c6..1d6560413cc5 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO | |||
@@ -358,7 +358,8 @@ Here is a list of some of the different kernel trees available: | |||
358 | quilt trees: | 358 | quilt trees: |
359 | - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> | 359 | - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> |
360 | kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ | 360 | kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ |
361 | 361 | - x86-64, partly i386, Andi Kleen <ak@suse.de> | |
362 | ftp.firstfloor.org:/pub/ak/x86_64/quilt/ | ||
362 | 363 | ||
363 | Bug Reporting | 364 | Bug Reporting |
364 | ------------- | 365 | ------------- |
diff --git a/Documentation/devices.txt b/Documentation/devices.txt index 66c725f530f3..addc67b1d770 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt | |||
@@ -2543,6 +2543,9 @@ Your cooperation is appreciated. | |||
2543 | 64 = /dev/usb/rio500 Diamond Rio 500 | 2543 | 64 = /dev/usb/rio500 Diamond Rio 500 |
2544 | 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) | 2544 | 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) |
2545 | 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) | 2545 | 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) |
2546 | 67 = /dev/usb/adutux0 1st Ontrak ADU device | ||
2547 | ... | ||
2548 | 76 = /dev/usb/adutux10 10th Ontrak ADU device | ||
2546 | 96 = /dev/usb/hiddev0 1st USB HID device | 2549 | 96 = /dev/usb/hiddev0 1st USB HID device |
2547 | ... | 2550 | ... |
2548 | 111 = /dev/usb/hiddev15 16th USB HID device | 2551 | 111 = /dev/usb/hiddev15 16th USB HID device |
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 552507fe9a7e..436697cb9388 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -6,6 +6,21 @@ be removed from this file. | |||
6 | 6 | ||
7 | --------------------------- | 7 | --------------------------- |
8 | 8 | ||
9 | What: /sys/devices/.../power/state | ||
10 | dev->power.power_state | ||
11 | dpm_runtime_{suspend,resume)() | ||
12 | When: July 2007 | ||
13 | Why: Broken design for runtime control over driver power states, confusing | ||
14 | driver-internal runtime power management with: mechanisms to support | ||
15 | system-wide sleep state transitions; event codes that distinguish | ||
16 | different phases of swsusp "sleep" transitions; and userspace policy | ||
17 | inputs. This framework was never widely used, and most attempts to | ||
18 | use it were broken. Drivers should instead be exposing domain-specific | ||
19 | interfaces either to kernel or to userspace. | ||
20 | Who: Pavel Machek <pavel@suse.cz> | ||
21 | |||
22 | --------------------------- | ||
23 | |||
9 | What: RAW driver (CONFIG_RAW_DRIVER) | 24 | What: RAW driver (CONFIG_RAW_DRIVER) |
10 | When: December 2005 | 25 | When: December 2005 |
11 | Why: declared obsolete since kernel 2.6.3 | 26 | Why: declared obsolete since kernel 2.6.3 |
@@ -55,6 +70,18 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br> | |||
55 | 70 | ||
56 | --------------------------- | 71 | --------------------------- |
57 | 72 | ||
73 | What: sys_sysctl | ||
74 | When: January 2007 | ||
75 | Why: The same information is available through /proc/sys and that is the | ||
76 | interface user space prefers to use. And there do not appear to be | ||
77 | any existing user in user space of sys_sysctl. The additional | ||
78 | maintenance overhead of keeping a set of binary names gets | ||
79 | in the way of doing a good job of maintaining this interface. | ||
80 | |||
81 | Who: Eric Biederman <ebiederm@xmission.com> | ||
82 | |||
83 | --------------------------- | ||
84 | |||
58 | What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) | 85 | What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) |
59 | When: November 2005 | 86 | When: November 2005 |
60 | Files: drivers/pcmcia/: pcmcia_ioctl.c | 87 | Files: drivers/pcmcia/: pcmcia_ioctl.c |
@@ -202,14 +229,6 @@ Who: Nick Piggin <npiggin@suse.de> | |||
202 | 229 | ||
203 | --------------------------- | 230 | --------------------------- |
204 | 231 | ||
205 | What: Support for the MIPS EV96100 evaluation board | ||
206 | When: September 2006 | ||
207 | Why: Does no longer build since at least November 15, 2003, apparently | ||
208 | no userbase left. | ||
209 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
210 | |||
211 | --------------------------- | ||
212 | |||
213 | What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board | 232 | What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board |
214 | When: September 2006 | 233 | When: September 2006 |
215 | Why: Does no longer build since quite some time, and was never popular, | 234 | Why: Does no longer build since quite some time, and was never popular, |
@@ -294,3 +313,24 @@ Why: The frame diverter is included in most distribution kernels, but is | |||
294 | It is not clear if anyone is still using it. | 313 | It is not clear if anyone is still using it. |
295 | Who: Stephen Hemminger <shemminger@osdl.org> | 314 | Who: Stephen Hemminger <shemminger@osdl.org> |
296 | 315 | ||
316 | --------------------------- | ||
317 | |||
318 | |||
319 | What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment | ||
320 | When: Oktober 2008 | ||
321 | Why: The stacking of class devices makes these values misleading and | ||
322 | inconsistent. | ||
323 | Class devices should not carry any of these properties, and bus | ||
324 | devices have SUBSYTEM and DRIVER as a replacement. | ||
325 | Who: Kay Sievers <kay.sievers@suse.de> | ||
326 | |||
327 | --------------------------- | ||
328 | |||
329 | What: i2c-isa | ||
330 | When: December 2006 | ||
331 | Why: i2c-isa is a non-sense and doesn't fit in the device driver | ||
332 | model. Drivers relying on it are better implemented as platform | ||
333 | drivers. | ||
334 | Who: Jean Delvare <khali@linux-fr.org> | ||
335 | |||
336 | --------------------------- | ||
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 99902ae6804e..7db71d6fba82 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -1124,11 +1124,15 @@ debugging information is displayed on console. | |||
1124 | NMI switch that most IA32 servers have fires unknown NMI up, for example. | 1124 | NMI switch that most IA32 servers have fires unknown NMI up, for example. |
1125 | If a system hangs up, try pressing the NMI switch. | 1125 | If a system hangs up, try pressing the NMI switch. |
1126 | 1126 | ||
1127 | [NOTE] | 1127 | nmi_watchdog |
1128 | This function and oprofile share a NMI callback. Therefore this function | 1128 | ------------ |
1129 | cannot be enabled when oprofile is activated. | 1129 | |
1130 | And NMI watchdog will be disabled when the value in this file is set to | 1130 | Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero |
1131 | non-zero. | 1131 | the NMI watchdog is enabled and will continuously test all online cpus to |
1132 | determine whether or not they are still functioning properly. | ||
1133 | |||
1134 | Because the NMI watchdog shares registers with oprofile, by disabling the NMI | ||
1135 | watchdog, oprofile may have more registers to utilize. | ||
1132 | 1136 | ||
1133 | 1137 | ||
1134 | 2.4 /proc/sys/vm - The virtual memory subsystem | 1138 | 2.4 /proc/sys/vm - The virtual memory subsystem |
diff --git a/Documentation/i2c/busses/i2c-viapro b/Documentation/i2c/busses/i2c-viapro index 16775663b9f5..25680346e0ac 100644 --- a/Documentation/i2c/busses/i2c-viapro +++ b/Documentation/i2c/busses/i2c-viapro | |||
@@ -7,9 +7,12 @@ Supported adapters: | |||
7 | * VIA Technologies, Inc. VT82C686A/B | 7 | * VIA Technologies, Inc. VT82C686A/B |
8 | Datasheet: Sometimes available at the VIA website | 8 | Datasheet: Sometimes available at the VIA website |
9 | 9 | ||
10 | * VIA Technologies, Inc. VT8231, VT8233, VT8233A, VT8235, VT8237R | 10 | * VIA Technologies, Inc. VT8231, VT8233, VT8233A |
11 | Datasheet: available on request from VIA | 11 | Datasheet: available on request from VIA |
12 | 12 | ||
13 | * VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251 | ||
14 | Datasheet: available on request and under NDA from VIA | ||
15 | |||
13 | Authors: | 16 | Authors: |
14 | Kyösti Mälkki <kmalkki@cc.hut.fi>, | 17 | Kyösti Mälkki <kmalkki@cc.hut.fi>, |
15 | Mark D. Studebaker <mdsxyz123@yahoo.com>, | 18 | Mark D. Studebaker <mdsxyz123@yahoo.com>, |
@@ -39,6 +42,8 @@ Your lspci -n listing must show one of these : | |||
39 | device 1106:8235 (VT8231 function 4) | 42 | device 1106:8235 (VT8231 function 4) |
40 | device 1106:3177 (VT8235) | 43 | device 1106:3177 (VT8235) |
41 | device 1106:3227 (VT8237R) | 44 | device 1106:3227 (VT8237R) |
45 | device 1106:3337 (VT8237A) | ||
46 | device 1106:3287 (VT8251) | ||
42 | 47 | ||
43 | If none of these show up, you should look in the BIOS for settings like | 48 | If none of these show up, you should look in the BIOS for settings like |
44 | enable ACPI / SMBus or even USB. | 49 | enable ACPI / SMBus or even USB. |
diff --git a/Documentation/i2c/i2c-stub b/Documentation/i2c/i2c-stub index d6dcb138abf5..9cc081e69764 100644 --- a/Documentation/i2c/i2c-stub +++ b/Documentation/i2c/i2c-stub | |||
@@ -6,9 +6,12 @@ This module is a very simple fake I2C/SMBus driver. It implements four | |||
6 | types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and | 6 | types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and |
7 | (r/w) word data. | 7 | (r/w) word data. |
8 | 8 | ||
9 | You need to provide a chip address as a module parameter when loading | ||
10 | this driver, which will then only react to SMBus commands to this address. | ||
11 | |||
9 | No hardware is needed nor associated with this module. It will accept write | 12 | No hardware is needed nor associated with this module. It will accept write |
10 | quick commands to all addresses; it will respond to the other commands (also | 13 | quick commands to one address; it will respond to the other commands (also |
11 | to all addresses) by reading from or writing to an array in memory. It will | 14 | to one address) by reading from or writing to an array in memory. It will |
12 | also spam the kernel logs for every command it handles. | 15 | also spam the kernel logs for every command it handles. |
13 | 16 | ||
14 | A pointer register with auto-increment is implemented for all byte | 17 | A pointer register with auto-increment is implemented for all byte |
@@ -21,6 +24,11 @@ The typical use-case is like this: | |||
21 | 3. load the target sensors chip driver module | 24 | 3. load the target sensors chip driver module |
22 | 4. observe its behavior in the kernel log | 25 | 4. observe its behavior in the kernel log |
23 | 26 | ||
27 | PARAMETERS: | ||
28 | |||
29 | int chip_addr: | ||
30 | The SMBus address to emulate a chip at. | ||
31 | |||
24 | CAVEATS: | 32 | CAVEATS: |
25 | 33 | ||
26 | There are independent arrays for byte/data and word/data commands. Depending | 34 | There are independent arrays for byte/data and word/data commands. Depending |
@@ -33,6 +41,9 @@ If the hardware for your driver has banked registers (e.g. Winbond sensors | |||
33 | chips) this module will not work well - although it could be extended to | 41 | chips) this module will not work well - although it could be extended to |
34 | support that pretty easily. | 42 | support that pretty easily. |
35 | 43 | ||
44 | Only one chip address is supported - although this module could be | ||
45 | extended to support more. | ||
46 | |||
36 | If you spam it hard enough, printk can be lossy. This module really wants | 47 | If you spam it hard enough, printk can be lossy. This module really wants |
37 | something like relayfs. | 48 | something like relayfs. |
38 | 49 | ||
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index b7d6abb501a6..e2cbd59cf2d0 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt | |||
@@ -421,6 +421,11 @@ more details, with real examples. | |||
421 | The second argument is optional, and if supplied will be used | 421 | The second argument is optional, and if supplied will be used |
422 | if first argument is not supported. | 422 | if first argument is not supported. |
423 | 423 | ||
424 | as-instr | ||
425 | as-instr checks if the assembler reports a specific instruction | ||
426 | and then outputs either option1 or option2 | ||
427 | C escapes are supported in the test instruction | ||
428 | |||
424 | cc-option | 429 | cc-option |
425 | cc-option is used to check if $(CC) supports a given option, and not | 430 | cc-option is used to check if $(CC) supports a given option, and not |
426 | supported to use an optional second option. | 431 | supported to use an optional second option. |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 71d05f481727..54983246930d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -573,8 +573,6 @@ running once the system is up. | |||
573 | gscd= [HW,CD] | 573 | gscd= [HW,CD] |
574 | Format: <io> | 574 | Format: <io> |
575 | 575 | ||
576 | gt96100eth= [NET] MIPS GT96100 Advanced Communication Controller | ||
577 | |||
578 | gus= [HW,OSS] | 576 | gus= [HW,OSS] |
579 | Format: <io>,<irq>,<dma>,<dma16> | 577 | Format: <io>,<irq>,<dma>,<dma16> |
580 | 578 | ||
@@ -1240,7 +1238,11 @@ running once the system is up. | |||
1240 | bootloader. This is currently used on | 1238 | bootloader. This is currently used on |
1241 | IXP2000 systems where the bus has to be | 1239 | IXP2000 systems where the bus has to be |
1242 | configured a certain way for adjunct CPUs. | 1240 | configured a certain way for adjunct CPUs. |
1243 | 1241 | noearly [X86] Don't do any early type 1 scanning. | |
1242 | This might help on some broken boards which | ||
1243 | machine check when some devices' config space | ||
1244 | is read. But various workarounds are disabled | ||
1245 | and some IOMMU drivers will not work. | ||
1244 | pcmv= [HW,PCMCIA] BadgePAD 4 | 1246 | pcmv= [HW,PCMCIA] BadgePAD 4 |
1245 | 1247 | ||
1246 | pd. [PARIDE] | 1248 | pd. [PARIDE] |
@@ -1363,6 +1365,14 @@ running once the system is up. | |||
1363 | 1365 | ||
1364 | reserve= [KNL,BUGS] Force the kernel to ignore some iomem area | 1366 | reserve= [KNL,BUGS] Force the kernel to ignore some iomem area |
1365 | 1367 | ||
1368 | reservetop= [IA-32] | ||
1369 | Format: nn[KMG] | ||
1370 | Reserves a hole at the top of the kernel virtual | ||
1371 | address space. | ||
1372 | |||
1373 | reset_devices [KNL] Force drivers to reset the underlying device | ||
1374 | during initialization. | ||
1375 | |||
1366 | resume= [SWSUSP] | 1376 | resume= [SWSUSP] |
1367 | Specify the partition device for software suspend | 1377 | Specify the partition device for software suspend |
1368 | 1378 | ||
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index afac780445cd..dc942eaf490f 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -192,6 +192,17 @@ or, for backwards compatibility, the option value. E.g., | |||
192 | arp_interval | 192 | arp_interval |
193 | 193 | ||
194 | Specifies the ARP link monitoring frequency in milliseconds. | 194 | Specifies the ARP link monitoring frequency in milliseconds. |
195 | |||
196 | The ARP monitor works by periodically checking the slave | ||
197 | devices to determine whether they have sent or received | ||
198 | traffic recently (the precise criteria depends upon the | ||
199 | bonding mode, and the state of the slave). Regular traffic is | ||
200 | generated via ARP probes issued for the addresses specified by | ||
201 | the arp_ip_target option. | ||
202 | |||
203 | This behavior can be modified by the arp_validate option, | ||
204 | below. | ||
205 | |||
195 | If ARP monitoring is used in an etherchannel compatible mode | 206 | If ARP monitoring is used in an etherchannel compatible mode |
196 | (modes 0 and 2), the switch should be configured in a mode | 207 | (modes 0 and 2), the switch should be configured in a mode |
197 | that evenly distributes packets across all links. If the | 208 | that evenly distributes packets across all links. If the |
@@ -213,6 +224,54 @@ arp_ip_target | |||
213 | maximum number of targets that can be specified is 16. The | 224 | maximum number of targets that can be specified is 16. The |
214 | default value is no IP addresses. | 225 | default value is no IP addresses. |
215 | 226 | ||
227 | arp_validate | ||
228 | |||
229 | Specifies whether or not ARP probes and replies should be | ||
230 | validated in the active-backup mode. This causes the ARP | ||
231 | monitor to examine the incoming ARP requests and replies, and | ||
232 | only consider a slave to be up if it is receiving the | ||
233 | appropriate ARP traffic. | ||
234 | |||
235 | Possible values are: | ||
236 | |||
237 | none or 0 | ||
238 | |||
239 | No validation is performed. This is the default. | ||
240 | |||
241 | active or 1 | ||
242 | |||
243 | Validation is performed only for the active slave. | ||
244 | |||
245 | backup or 2 | ||
246 | |||
247 | Validation is performed only for backup slaves. | ||
248 | |||
249 | all or 3 | ||
250 | |||
251 | Validation is performed for all slaves. | ||
252 | |||
253 | For the active slave, the validation checks ARP replies to | ||
254 | confirm that they were generated by an arp_ip_target. Since | ||
255 | backup slaves do not typically receive these replies, the | ||
256 | validation performed for backup slaves is on the ARP request | ||
257 | sent out via the active slave. It is possible that some | ||
258 | switch or network configurations may result in situations | ||
259 | wherein the backup slaves do not receive the ARP requests; in | ||
260 | such a situation, validation of backup slaves must be | ||
261 | disabled. | ||
262 | |||
263 | This option is useful in network configurations in which | ||
264 | multiple bonding hosts are concurrently issuing ARPs to one or | ||
265 | more targets beyond a common switch. Should the link between | ||
266 | the switch and target fail (but not the switch itself), the | ||
267 | probe traffic generated by the multiple bonding instances will | ||
268 | fool the standard ARP monitor into considering the links as | ||
269 | still up. Use of the arp_validate option can resolve this, as | ||
270 | the ARP monitor will only consider ARP requests and replies | ||
271 | associated with its own instance of bonding. | ||
272 | |||
273 | This option was added in bonding version 3.1.0. | ||
274 | |||
216 | downdelay | 275 | downdelay |
217 | 276 | ||
218 | Specifies the time, in milliseconds, to wait before disabling | 277 | Specifies the time, in milliseconds, to wait before disabling |
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index c45daabd3bfe..74563b38ffd9 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt | |||
@@ -1,7 +1,6 @@ | |||
1 | DCCP protocol | 1 | DCCP protocol |
2 | ============ | 2 | ============ |
3 | 3 | ||
4 | Last updated: 10 November 2005 | ||
5 | 4 | ||
6 | Contents | 5 | Contents |
7 | ======== | 6 | ======== |
@@ -42,8 +41,11 @@ Socket options | |||
42 | DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for | 41 | DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for |
43 | calculations. | 42 | calculations. |
44 | 43 | ||
45 | DCCP_SOCKOPT_SERVICE sets the service. This is compulsory as per the | 44 | DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of |
46 | specification. If you don't set it you will get EPROTO. | 45 | service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, |
46 | the socket will fall back to 0 (which means that no meaningful service code | ||
47 | is present). Connecting sockets set at most one service option; for | ||
48 | listening sockets, multiple service codes can be specified. | ||
47 | 49 | ||
48 | Notes | 50 | Notes |
49 | ===== | 51 | ===== |
diff --git a/Documentation/nommu-mmap.txt b/Documentation/nommu-mmap.txt index b88ebe4d808c..7714f57caad5 100644 --- a/Documentation/nommu-mmap.txt +++ b/Documentation/nommu-mmap.txt | |||
@@ -116,6 +116,9 @@ FURTHER NOTES ON NO-MMU MMAP | |||
116 | (*) A list of all the mappings on the system is visible through /proc/maps in | 116 | (*) A list of all the mappings on the system is visible through /proc/maps in |
117 | no-MMU mode. | 117 | no-MMU mode. |
118 | 118 | ||
119 | (*) A list of all the mappings in use by a process is visible through | ||
120 | /proc/<pid>/maps in no-MMU mode. | ||
121 | |||
119 | (*) Supplying MAP_FIXED or a requesting a particular mapping address will | 122 | (*) Supplying MAP_FIXED or a requesting a particular mapping address will |
120 | result in an error. | 123 | result in an error. |
121 | 124 | ||
@@ -125,6 +128,49 @@ FURTHER NOTES ON NO-MMU MMAP | |||
125 | error will result if they don't. This is most likely to be encountered | 128 | error will result if they don't. This is most likely to be encountered |
126 | with character device files, pipes, fifos and sockets. | 129 | with character device files, pipes, fifos and sockets. |
127 | 130 | ||
131 | |||
132 | ========================== | ||
133 | INTERPROCESS SHARED MEMORY | ||
134 | ========================== | ||
135 | |||
136 | Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU | ||
137 | mode. The former through the usual mechanism, the latter through files created | ||
138 | on ramfs or tmpfs mounts. | ||
139 | |||
140 | |||
141 | ======= | ||
142 | FUTEXES | ||
143 | ======= | ||
144 | |||
145 | Futexes are supported in NOMMU mode if the arch supports them. An error will | ||
146 | be given if an address passed to the futex system call lies outside the | ||
147 | mappings made by a process or if the mapping in which the address lies does not | ||
148 | support futexes (such as an I/O chardev mapping). | ||
149 | |||
150 | |||
151 | ============= | ||
152 | NO-MMU MREMAP | ||
153 | ============= | ||
154 | |||
155 | The mremap() function is partially supported. It may change the size of a | ||
156 | mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size | ||
157 | of the mapping exceeds the size of the slab object currently occupied by the | ||
158 | memory to which the mapping refers, or if a smaller slab object could be used. | ||
159 | |||
160 | MREMAP_FIXED is not supported, though it is ignored if there's no change of | ||
161 | address and the object does not need to be moved. | ||
162 | |||
163 | Shared mappings may not be moved. Shareable mappings may not be moved either, | ||
164 | even if they are not currently shared. | ||
165 | |||
166 | The mremap() function must be given an exact match for base address and size of | ||
167 | a previously mapped object. It may not be used to create holes in existing | ||
168 | mappings, move parts of existing mappings or resize parts of mappings. It must | ||
169 | act on a complete mapping. | ||
170 | |||
171 | [*] Not currently supported. | ||
172 | |||
173 | |||
128 | ============================================ | 174 | ============================================ |
129 | PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT | 175 | PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT |
130 | ============================================ | 176 | ============================================ |
diff --git a/Documentation/pcieaer-howto.txt b/Documentation/pcieaer-howto.txt new file mode 100644 index 000000000000..16c251230c82 --- /dev/null +++ b/Documentation/pcieaer-howto.txt | |||
@@ -0,0 +1,253 @@ | |||
1 | The PCI Express Advanced Error Reporting Driver Guide HOWTO | ||
2 | T. Long Nguyen <tom.l.nguyen@intel.com> | ||
3 | Yanmin Zhang <yanmin.zhang@intel.com> | ||
4 | 07/29/2006 | ||
5 | |||
6 | |||
7 | 1. Overview | ||
8 | |||
9 | 1.1 About this guide | ||
10 | |||
11 | This guide describes the basics of the PCI Express Advanced Error | ||
12 | Reporting (AER) driver and provides information on how to use it, as | ||
13 | well as how to enable the drivers of endpoint devices to conform with | ||
14 | PCI Express AER driver. | ||
15 | |||
16 | 1.2 Copyright © Intel Corporation 2006. | ||
17 | |||
18 | 1.3 What is the PCI Express AER Driver? | ||
19 | |||
20 | PCI Express error signaling can occur on the PCI Express link itself | ||
21 | or on behalf of transactions initiated on the link. PCI Express | ||
22 | defines two error reporting paradigms: the baseline capability and | ||
23 | the Advanced Error Reporting capability. The baseline capability is | ||
24 | required of all PCI Express components providing a minimum defined | ||
25 | set of error reporting requirements. Advanced Error Reporting | ||
26 | capability is implemented with a PCI Express advanced error reporting | ||
27 | extended capability structure providing more robust error reporting. | ||
28 | |||
29 | The PCI Express AER driver provides the infrastructure to support PCI | ||
30 | Express Advanced Error Reporting capability. The PCI Express AER | ||
31 | driver provides three basic functions: | ||
32 | |||
33 | - Gathers the comprehensive error information if errors occurred. | ||
34 | - Reports error to the users. | ||
35 | - Performs error recovery actions. | ||
36 | |||
37 | AER driver only attaches root ports which support PCI-Express AER | ||
38 | capability. | ||
39 | |||
40 | |||
41 | 2. User Guide | ||
42 | |||
43 | 2.1 Include the PCI Express AER Root Driver into the Linux Kernel | ||
44 | |||
45 | The PCI Express AER Root driver is a Root Port service driver attached | ||
46 | to the PCI Express Port Bus driver. If a user wants to use it, the driver | ||
47 | has to be compiled. Option CONFIG_PCIEAER supports this capability. It | ||
48 | depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and | ||
49 | CONFIG_PCIEAER = y. | ||
50 | |||
51 | 2.2 Load PCI Express AER Root Driver | ||
52 | There is a case where a system has AER support in BIOS. Enabling the AER | ||
53 | Root driver and having AER support in BIOS may result unpredictable | ||
54 | behavior. To avoid this conflict, a successful load of the AER Root driver | ||
55 | requires ACPI _OSC support in the BIOS to allow the AER Root driver to | ||
56 | request for native control of AER. See the PCI FW 3.0 Specification for | ||
57 | details regarding OSC usage. Currently, lots of firmwares don't provide | ||
58 | _OSC support while they use PCI Express. To support such firmwares, | ||
59 | forceload, a parameter of type bool, could enable AER to continue to | ||
60 | be initiated although firmwares have no _OSC support. To enable the | ||
61 | walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line | ||
62 | when booting kernel. Note that forceload=n by default. | ||
63 | |||
64 | 2.3 AER error output | ||
65 | When a PCI-E AER error is captured, an error message will be outputed to | ||
66 | console. If it's a correctable error, it is outputed as a warning. | ||
67 | Otherwise, it is printed as an error. So users could choose different | ||
68 | log level to filter out correctable error messages. | ||
69 | |||
70 | Below shows an example. | ||
71 | +------ PCI-Express Device Error -----+ | ||
72 | Error Severity : Uncorrected (Fatal) | ||
73 | PCIE Bus Error type : Transaction Layer | ||
74 | Unsupported Request : First | ||
75 | Requester ID : 0500 | ||
76 | VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h | ||
77 | TLB Header: | ||
78 | 04000001 00200a03 05010000 00050100 | ||
79 | |||
80 | In the example, 'Requester ID' means the ID of the device who sends | ||
81 | the error message to root port. Pls. refer to pci express specs for | ||
82 | other fields. | ||
83 | |||
84 | |||
85 | 3. Developer Guide | ||
86 | |||
87 | To enable AER aware support requires a software driver to configure | ||
88 | the AER capability structure within its device and to provide callbacks. | ||
89 | |||
90 | To support AER better, developers need understand how AER does work | ||
91 | firstly. | ||
92 | |||
93 | PCI Express errors are classified into two types: correctable errors | ||
94 | and uncorrectable errors. This classification is based on the impacts | ||
95 | of those errors, which may result in degraded performance or function | ||
96 | failure. | ||
97 | |||
98 | Correctable errors pose no impacts on the functionality of the | ||
99 | interface. The PCI Express protocol can recover without any software | ||
100 | intervention or any loss of data. These errors are detected and | ||
101 | corrected by hardware. Unlike correctable errors, uncorrectable | ||
102 | errors impact functionality of the interface. Uncorrectable errors | ||
103 | can cause a particular transaction or a particular PCI Express link | ||
104 | to be unreliable. Depending on those error conditions, uncorrectable | ||
105 | errors are further classified into non-fatal errors and fatal errors. | ||
106 | Non-fatal errors cause the particular transaction to be unreliable, | ||
107 | but the PCI Express link itself is fully functional. Fatal errors, on | ||
108 | the other hand, cause the link to be unreliable. | ||
109 | |||
110 | When AER is enabled, a PCI Express device will automatically send an | ||
111 | error message to the PCIE root port above it when the device captures | ||
112 | an error. The Root Port, upon receiving an error reporting message, | ||
113 | internally processes and logs the error message in its PCI Express | ||
114 | capability structure. Error information being logged includes storing | ||
115 | the error reporting agent's requestor ID into the Error Source | ||
116 | Identification Registers and setting the error bits of the Root Error | ||
117 | Status Register accordingly. If AER error reporting is enabled in Root | ||
118 | Error Command Register, the Root Port generates an interrupt if an | ||
119 | error is detected. | ||
120 | |||
121 | Note that the errors as described above are related to the PCI Express | ||
122 | hierarchy and links. These errors do not include any device specific | ||
123 | errors because device specific errors will still get sent directly to | ||
124 | the device driver. | ||
125 | |||
126 | 3.1 Configure the AER capability structure | ||
127 | |||
128 | AER aware drivers of PCI Express component need change the device | ||
129 | control registers to enable AER. They also could change AER registers, | ||
130 | including mask and severity registers. Helper function | ||
131 | pci_enable_pcie_error_reporting could be used to enable AER. See | ||
132 | section 3.3. | ||
133 | |||
134 | 3.2. Provide callbacks | ||
135 | |||
136 | 3.2.1 callback reset_link to reset pci express link | ||
137 | |||
138 | This callback is used to reset the pci express physical link when a | ||
139 | fatal error happens. The root port aer service driver provides a | ||
140 | default reset_link function, but different upstream ports might | ||
141 | have different specifications to reset pci express link, so all | ||
142 | upstream ports should provide their own reset_link functions. | ||
143 | |||
144 | In struct pcie_port_service_driver, a new pointer, reset_link, is | ||
145 | added. | ||
146 | |||
147 | pci_ers_result_t (*reset_link) (struct pci_dev *dev); | ||
148 | |||
149 | Section 3.2.2.2 provides more detailed info on when to call | ||
150 | reset_link. | ||
151 | |||
152 | 3.2.2 PCI error-recovery callbacks | ||
153 | |||
154 | The PCI Express AER Root driver uses error callbacks to coordinate | ||
155 | with downstream device drivers associated with a hierarchy in question | ||
156 | when performing error recovery actions. | ||
157 | |||
158 | Data struct pci_driver has a pointer, err_handler, to point to | ||
159 | pci_error_handlers who consists of a couple of callback function | ||
160 | pointers. AER driver follows the rules defined in | ||
161 | pci-error-recovery.txt except pci express specific parts (e.g. | ||
162 | reset_link). Pls. refer to pci-error-recovery.txt for detailed | ||
163 | definitions of the callbacks. | ||
164 | |||
165 | Below sections specify when to call the error callback functions. | ||
166 | |||
167 | 3.2.2.1 Correctable errors | ||
168 | |||
169 | Correctable errors pose no impacts on the functionality of | ||
170 | the interface. The PCI Express protocol can recover without any | ||
171 | software intervention or any loss of data. These errors do not | ||
172 | require any recovery actions. The AER driver clears the device's | ||
173 | correctable error status register accordingly and logs these errors. | ||
174 | |||
175 | 3.2.2.2 Non-correctable (non-fatal and fatal) errors | ||
176 | |||
177 | If an error message indicates a non-fatal error, performing link reset | ||
178 | at upstream is not required. The AER driver calls error_detected(dev, | ||
179 | pci_channel_io_normal) to all drivers associated within a hierarchy in | ||
180 | question. for example, | ||
181 | EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort. | ||
182 | If Upstream port A captures an AER error, the hierarchy consists of | ||
183 | Downstream port B and EndPoint. | ||
184 | |||
185 | A driver may return PCI_ERS_RESULT_CAN_RECOVER, | ||
186 | PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on | ||
187 | whether it can recover or the AER driver calls mmio_enabled as next. | ||
188 | |||
189 | If an error message indicates a fatal error, kernel will broadcast | ||
190 | error_detected(dev, pci_channel_io_frozen) to all drivers within | ||
191 | a hierarchy in question. Then, performing link reset at upstream is | ||
192 | necessary. As different kinds of devices might use different approaches | ||
193 | to reset link, AER port service driver is required to provide the | ||
194 | function to reset link. Firstly, kernel looks for if the upstream | ||
195 | component has an aer driver. If it has, kernel uses the reset_link | ||
196 | callback of the aer driver. If the upstream component has no aer driver | ||
197 | and the port is downstream port, we will use the aer driver of the | ||
198 | root port who reports the AER error. As for upstream ports, | ||
199 | they should provide their own aer service drivers with reset_link | ||
200 | function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and | ||
201 | reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes | ||
202 | to mmio_enabled. | ||
203 | |||
204 | 3.3 helper functions | ||
205 | |||
206 | 3.3.1 int pci_find_aer_capability(struct pci_dev *dev); | ||
207 | pci_find_aer_capability locates the PCI Express AER capability | ||
208 | in the device configuration space. If the device doesn't support | ||
209 | PCI-Express AER, the function returns 0. | ||
210 | |||
211 | 3.3.2 int pci_enable_pcie_error_reporting(struct pci_dev *dev); | ||
212 | pci_enable_pcie_error_reporting enables the device to send error | ||
213 | messages to root port when an error is detected. Note that devices | ||
214 | don't enable the error reporting by default, so device drivers need | ||
215 | call this function to enable it. | ||
216 | |||
217 | 3.3.3 int pci_disable_pcie_error_reporting(struct pci_dev *dev); | ||
218 | pci_disable_pcie_error_reporting disables the device to send error | ||
219 | messages to root port when an error is detected. | ||
220 | |||
221 | 3.3.4 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev); | ||
222 | pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable | ||
223 | error status register. | ||
224 | |||
225 | 3.4 Frequent Asked Questions | ||
226 | |||
227 | Q: What happens if a PCI Express device driver does not provide an | ||
228 | error recovery handler (pci_driver->err_handler is equal to NULL)? | ||
229 | |||
230 | A: The devices attached with the driver won't be recovered. If the | ||
231 | error is fatal, kernel will print out warning messages. Please refer | ||
232 | to section 3 for more information. | ||
233 | |||
234 | Q: What happens if an upstream port service driver does not provide | ||
235 | callback reset_link? | ||
236 | |||
237 | A: Fatal error recovery will fail if the errors are reported by the | ||
238 | upstream ports who are attached by the service driver. | ||
239 | |||
240 | Q: How does this infrastructure deal with driver that is not PCI | ||
241 | Express aware? | ||
242 | |||
243 | A: This infrastructure calls the error callback functions of the | ||
244 | driver when an error happens. But if the driver is not aware of | ||
245 | PCI Express, the device might not report its own errors to root | ||
246 | port. | ||
247 | |||
248 | Q: What modifications will that driver need to make it compatible | ||
249 | with the PCI Express AER Root driver? | ||
250 | |||
251 | A: It could call the helper functions to enable AER in devices and | ||
252 | cleanup uncorrectable status register. Pls. refer to section 3.3. | ||
253 | |||
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index fba1e05c47c7..d0e79d5820a5 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt | |||
@@ -1,208 +1,553 @@ | |||
1 | Most of the code in Linux is device drivers, so most of the Linux power | ||
2 | management code is also driver-specific. Most drivers will do very little; | ||
3 | others, especially for platforms with small batteries (like cell phones), | ||
4 | will do a lot. | ||
5 | |||
6 | This writeup gives an overview of how drivers interact with system-wide | ||
7 | power management goals, emphasizing the models and interfaces that are | ||
8 | shared by everything that hooks up to the driver model core. Read it as | ||
9 | background for the domain-specific work you'd do with any specific driver. | ||
10 | |||
11 | |||
12 | Two Models for Device Power Management | ||
13 | ====================================== | ||
14 | Drivers will use one or both of these models to put devices into low-power | ||
15 | states: | ||
16 | |||
17 | System Sleep model: | ||
18 | Drivers can enter low power states as part of entering system-wide | ||
19 | low-power states like "suspend-to-ram", or (mostly for systems with | ||
20 | disks) "hibernate" (suspend-to-disk). | ||
21 | |||
22 | This is something that device, bus, and class drivers collaborate on | ||
23 | by implementing various role-specific suspend and resume methods to | ||
24 | cleanly power down hardware and software subsystems, then reactivate | ||
25 | them without loss of data. | ||
26 | |||
27 | Some drivers can manage hardware wakeup events, which make the system | ||
28 | leave that low-power state. This feature may be disabled using the | ||
29 | relevant /sys/devices/.../power/wakeup file; enabling it may cost some | ||
30 | power usage, but let the whole system enter low power states more often. | ||
31 | |||
32 | Runtime Power Management model: | ||
33 | Drivers may also enter low power states while the system is running, | ||
34 | independently of other power management activity. Upstream drivers | ||
35 | will normally not know (or care) if the device is in some low power | ||
36 | state when issuing requests; the driver will auto-resume anything | ||
37 | that's needed when it gets a request. | ||
38 | |||
39 | This doesn't have, or need much infrastructure; it's just something you | ||
40 | should do when writing your drivers. For example, clk_disable() unused | ||
41 | clocks as part of minimizing power drain for currently-unused hardware. | ||
42 | Of course, sometimes clusters of drivers will collaborate with each | ||
43 | other, which could involve task-specific power management. | ||
44 | |||
45 | There's not a lot to be said about those low power states except that they | ||
46 | are very system-specific, and often device-specific. Also, that if enough | ||
47 | drivers put themselves into low power states (at "runtime"), the effect may be | ||
48 | the same as entering some system-wide low-power state (system sleep) ... and | ||
49 | that synergies exist, so that several drivers using runtime pm might put the | ||
50 | system into a state where even deeper power saving options are available. | ||
51 | |||
52 | Most suspended devices will have quiesced all I/O: no more DMA or irqs, no | ||
53 | more data read or written, and requests from upstream drivers are no longer | ||
54 | accepted. A given bus or platform may have different requirements though. | ||
55 | |||
56 | Examples of hardware wakeup events include an alarm from a real time clock, | ||
57 | network wake-on-LAN packets, keyboard or mouse activity, and media insertion | ||
58 | or removal (for PCMCIA, MMC/SD, USB, and so on). | ||
59 | |||
60 | |||
61 | Interfaces for Entering System Sleep States | ||
62 | =========================================== | ||
63 | Most of the programming interfaces a device driver needs to know about | ||
64 | relate to that first model: entering a system-wide low power state, | ||
65 | rather than just minimizing power consumption by one device. | ||
66 | |||
67 | |||
68 | Bus Driver Methods | ||
69 | ------------------ | ||
70 | The core methods to suspend and resume devices reside in struct bus_type. | ||
71 | These are mostly of interest to people writing infrastructure for busses | ||
72 | like PCI or USB, or because they define the primitives that device drivers | ||
73 | may need to apply in domain-specific ways to their devices: | ||
1 | 74 | ||
2 | Device Power Management | 75 | struct bus_type { |
76 | ... | ||
77 | int (*suspend)(struct device *dev, pm_message_t state); | ||
78 | int (*suspend_late)(struct device *dev, pm_message_t state); | ||
3 | 79 | ||
80 | int (*resume_early)(struct device *dev); | ||
81 | int (*resume)(struct device *dev); | ||
82 | }; | ||
4 | 83 | ||
5 | Device power management encompasses two areas - the ability to save | 84 | Bus drivers implement those methods as appropriate for the hardware and |
6 | state and transition a device to a low-power state when the system is | 85 | the drivers using it; PCI works differently from USB, and so on. Not many |
7 | entering a low-power state; and the ability to transition a device to | 86 | people write bus drivers; most driver code is a "device driver" that |
8 | a low-power state while the system is running (and independently of | 87 | builds on top of bus-specific framework code. |
9 | any other power management activity). | 88 | |
89 | For more information on these driver calls, see the description later; | ||
90 | they are called in phases for every device, respecting the parent-child | ||
91 | sequencing in the driver model tree. Note that as this is being written, | ||
92 | only the suspend() and resume() are widely available; not many bus drivers | ||
93 | leverage all of those phases, or pass them down to lower driver levels. | ||
94 | |||
95 | |||
96 | /sys/devices/.../power/wakeup files | ||
97 | ----------------------------------- | ||
98 | All devices in the driver model have two flags to control handling of | ||
99 | wakeup events, which are hardware signals that can force the device and/or | ||
100 | system out of a low power state. These are initialized by bus or device | ||
101 | driver code using device_init_wakeup(dev,can_wakeup). | ||
102 | |||
103 | The "can_wakeup" flag just records whether the device (and its driver) can | ||
104 | physically support wakeup events. When that flag is clear, the sysfs | ||
105 | "wakeup" file is empty, and device_may_wakeup() returns false. | ||
106 | |||
107 | For devices that can issue wakeup events, a separate flag controls whether | ||
108 | that device should try to use its wakeup mechanism. The initial value of | ||
109 | device_may_wakeup() will be true, so that the device's "wakeup" file holds | ||
110 | the value "enabled". Userspace can change that to "disabled" so that | ||
111 | device_may_wakeup() returns false; or change it back to "enabled" (so that | ||
112 | it returns true again). | ||
113 | |||
114 | |||
115 | EXAMPLE: PCI Device Driver Methods | ||
116 | ----------------------------------- | ||
117 | PCI framework software calls these methods when the PCI device driver bound | ||
118 | to a device device has provided them: | ||
119 | |||
120 | struct pci_driver { | ||
121 | ... | ||
122 | int (*suspend)(struct pci_device *pdev, pm_message_t state); | ||
123 | int (*suspend_late)(struct pci_device *pdev, pm_message_t state); | ||
124 | |||
125 | int (*resume_early)(struct pci_device *pdev); | ||
126 | int (*resume)(struct pci_device *pdev); | ||
127 | }; | ||
10 | 128 | ||
129 | Drivers will implement those methods, and call PCI-specific procedures | ||
130 | like pci_set_power_state(), pci_enable_wake(), pci_save_state(), and | ||
131 | pci_restore_state() to manage PCI-specific mechanisms. (PCI config space | ||
132 | could be saved during driver probe, if it weren't for the fact that some | ||
133 | systems rely on userspace tweaking using setpci.) Devices are suspended | ||
134 | before their bridges enter low power states, and likewise bridges resume | ||
135 | before their devices. | ||
136 | |||
137 | |||
138 | Upper Layers of Driver Stacks | ||
139 | ----------------------------- | ||
140 | Device drivers generally have at least two interfaces, and the methods | ||
141 | sketched above are the ones which apply to the lower level (nearer PCI, USB, | ||
142 | or other bus hardware). The network and block layers are examples of upper | ||
143 | level interfaces, as is a character device talking to userspace. | ||
144 | |||
145 | Power management requests normally need to flow through those upper levels, | ||
146 | which often use domain-oriented requests like "blank that screen". In | ||
147 | some cases those upper levels will have power management intelligence that | ||
148 | relates to end-user activity, or other devices that work in cooperation. | ||
149 | |||
150 | When those interfaces are structured using class interfaces, there is a | ||
151 | standard way to have the upper layer stop issuing requests to a given | ||
152 | class device (and restart later): | ||
153 | |||
154 | struct class { | ||
155 | ... | ||
156 | int (*suspend)(struct device *dev, pm_message_t state); | ||
157 | int (*resume)(struct device *dev); | ||
158 | }; | ||
11 | 159 | ||
12 | Methods | 160 | Those calls are issued in specific phases of the process by which the |
161 | system enters a low power "suspend" state, or resumes from it. | ||
162 | |||
163 | |||
164 | Calling Drivers to Enter System Sleep States | ||
165 | ============================================ | ||
166 | When the system enters a low power state, each device's driver is asked | ||
167 | to suspend the device by putting it into state compatible with the target | ||
168 | system state. That's usually some version of "off", but the details are | ||
169 | system-specific. Also, wakeup-enabled devices will usually stay partly | ||
170 | functional in order to wake the system. | ||
171 | |||
172 | When the system leaves that low power state, the device's driver is asked | ||
173 | to resume it. The suspend and resume operations always go together, and | ||
174 | both are multi-phase operations. | ||
175 | |||
176 | For simple drivers, suspend might quiesce the device using the class code | ||
177 | and then turn its hardware as "off" as possible with late_suspend. The | ||
178 | matching resume calls would then completely reinitialize the hardware | ||
179 | before reactivating its class I/O queues. | ||
180 | |||
181 | More power-aware drivers drivers will use more than one device low power | ||
182 | state, either at runtime or during system sleep states, and might trigger | ||
183 | system wakeup events. | ||
184 | |||
185 | |||
186 | Call Sequence Guarantees | ||
187 | ------------------------ | ||
188 | To ensure that bridges and similar links needed to talk to a device are | ||
189 | available when the device is suspended or resumed, the device tree is | ||
190 | walked in a bottom-up order to suspend devices. A top-down order is | ||
191 | used to resume those devices. | ||
192 | |||
193 | The ordering of the device tree is defined by the order in which devices | ||
194 | get registered: a child can never be registered, probed or resumed before | ||
195 | its parent; and can't be removed or suspended after that parent. | ||
196 | |||
197 | The policy is that the device tree should match hardware bus topology. | ||
198 | (Or at least the control bus, for devices which use multiple busses.) | ||
199 | |||
200 | |||
201 | Suspending Devices | ||
202 | ------------------ | ||
203 | Suspending a given device is done in several phases. Suspending the | ||
204 | system always includes every phase, executing calls for every device | ||
205 | before the next phase begins. Not all busses or classes support all | ||
206 | these callbacks; and not all drivers use all the callbacks. | ||
207 | |||
208 | The phases are seen by driver notifications issued in this order: | ||
209 | |||
210 | 1 class.suspend(dev, message) is called after tasks are frozen, for | ||
211 | devices associated with a class that has such a method. This | ||
212 | method may sleep. | ||
213 | |||
214 | Since I/O activity usually comes from such higher layers, this is | ||
215 | a good place to quiesce all drivers of a given type (and keep such | ||
216 | code out of those drivers). | ||
217 | |||
218 | 2 bus.suspend(dev, message) is called next. This method may sleep, | ||
219 | and is often morphed into a device driver call with bus-specific | ||
220 | parameters and/or rules. | ||
221 | |||
222 | This call should handle parts of device suspend logic that require | ||
223 | sleeping. It probably does work to quiesce the device which hasn't | ||
224 | been abstracted into class.suspend() or bus.suspend_late(). | ||
225 | |||
226 | 3 bus.suspend_late(dev, message) is called with IRQs disabled, and | ||
227 | with only one CPU active. Until the bus.resume_early() phase | ||
228 | completes (see later), IRQs are not enabled again. This method | ||
229 | won't be exposed by all busses; for message based busses like USB, | ||
230 | I2C, or SPI, device interactions normally require IRQs. This bus | ||
231 | call may be morphed into a driver call with bus-specific parameters. | ||
232 | |||
233 | This call might save low level hardware state that might otherwise | ||
234 | be lost in the upcoming low power state, and actually put the | ||
235 | device into a low power state ... so that in some cases the device | ||
236 | may stay partly usable until this late. This "late" call may also | ||
237 | help when coping with hardware that behaves badly. | ||
238 | |||
239 | The pm_message_t parameter is currently used to refine those semantics | ||
240 | (described later). | ||
241 | |||
242 | At the end of those phases, drivers should normally have stopped all I/O | ||
243 | transactions (DMA, IRQs), saved enough state that they can re-initialize | ||
244 | or restore previous state (as needed by the hardware), and placed the | ||
245 | device into a low-power state. On many platforms they will also use | ||
246 | clk_disable() to gate off one or more clock sources; sometimes they will | ||
247 | also switch off power supplies, or reduce voltages. Drivers which have | ||
248 | runtime PM support may already have performed some or all of the steps | ||
249 | needed to prepare for the upcoming system sleep state. | ||
250 | |||
251 | When any driver sees that its device_can_wakeup(dev), it should make sure | ||
252 | to use the relevant hardware signals to trigger a system wakeup event. | ||
253 | For example, enable_irq_wake() might identify GPIO signals hooked up to | ||
254 | a switch or other external hardware, and pci_enable_wake() does something | ||
255 | similar for PCI's PME# signal. | ||
256 | |||
257 | If a driver (or bus, or class) fails it suspend method, the system won't | ||
258 | enter the desired low power state; it will resume all the devices it's | ||
259 | suspended so far. | ||
260 | |||
261 | Note that drivers may need to perform different actions based on the target | ||
262 | system lowpower/sleep state. At this writing, there are only platform | ||
263 | specific APIs through which drivers could determine those target states. | ||
264 | |||
265 | |||
266 | Device Low Power (suspend) States | ||
267 | --------------------------------- | ||
268 | Device low-power states aren't very standard. One device might only handle | ||
269 | "on" and "off, while another might support a dozen different versions of | ||
270 | "on" (how many engines are active?), plus a state that gets back to "on" | ||
271 | faster than from a full "off". | ||
272 | |||
273 | Some busses define rules about what different suspend states mean. PCI | ||
274 | gives one example: after the suspend sequence completes, a non-legacy | ||
275 | PCI device may not perform DMA or issue IRQs, and any wakeup events it | ||
276 | issues would be issued through the PME# bus signal. Plus, there are | ||
277 | several PCI-standard device states, some of which are optional. | ||
278 | |||
279 | In contrast, integrated system-on-chip processors often use irqs as the | ||
280 | wakeup event sources (so drivers would call enable_irq_wake) and might | ||
281 | be able to treat DMA completion as a wakeup event (sometimes DMA can stay | ||
282 | active too, it'd only be the CPU and some peripherals that sleep). | ||
283 | |||
284 | Some details here may be platform-specific. Systems may have devices that | ||
285 | can be fully active in certain sleep states, such as an LCD display that's | ||
286 | refreshed using DMA while most of the system is sleeping lightly ... and | ||
287 | its frame buffer might even be updated by a DSP or other non-Linux CPU while | ||
288 | the Linux control processor stays idle. | ||
289 | |||
290 | Moreover, the specific actions taken may depend on the target system state. | ||
291 | One target system state might allow a given device to be very operational; | ||
292 | another might require a hard shut down with re-initialization on resume. | ||
293 | And two different target systems might use the same device in different | ||
294 | ways; the aforementioned LCD might be active in one product's "standby", | ||
295 | but a different product using the same SOC might work differently. | ||
296 | |||
297 | |||
298 | Meaning of pm_message_t.event | ||
299 | ----------------------------- | ||
300 | Parameters to suspend calls include the device affected and a message of | ||
301 | type pm_message_t, which has one field: the event. If driver does not | ||
302 | recognize the event code, suspend calls may abort the request and return | ||
303 | a negative errno. However, most drivers will be fine if they implement | ||
304 | PM_EVENT_SUSPEND semantics for all messages. | ||
305 | |||
306 | The event codes are used to refine the goal of suspending the device, and | ||
307 | mostly matter when creating or resuming system memory image snapshots, as | ||
308 | used with suspend-to-disk: | ||
309 | |||
310 | PM_EVENT_SUSPEND -- quiesce the driver and put hardware into a low-power | ||
311 | state. When used with system sleep states like "suspend-to-RAM" or | ||
312 | "standby", the upcoming resume() call will often be able to rely on | ||
313 | state kept in hardware, or issue system wakeup events. When used | ||
314 | instead with suspend-to-disk, few devices support this capability; | ||
315 | most are completely powered off. | ||
316 | |||
317 | PM_EVENT_FREEZE -- quiesce the driver, but don't necessarily change into | ||
318 | any low power mode. A system snapshot is about to be taken, often | ||
319 | followed by a call to the driver's resume() method. Neither wakeup | ||
320 | events nor DMA are allowed. | ||
321 | |||
322 | PM_EVENT_PRETHAW -- quiesce the driver, knowing that the upcoming resume() | ||
323 | will restore a suspend-to-disk snapshot from a different kernel image. | ||
324 | Drivers that are smart enough to look at their hardware state during | ||
325 | resume() processing need that state to be correct ... a PRETHAW could | ||
326 | be used to invalidate that state (by resetting the device), like a | ||
327 | shutdown() invocation would before a kexec() or system halt. Other | ||
328 | drivers might handle this the same way as PM_EVENT_FREEZE. Neither | ||
329 | wakeup events nor DMA are allowed. | ||
330 | |||
331 | To enter "standby" (ACPI S1) or "Suspend to RAM" (STR, ACPI S3) states, or | ||
332 | the similarly named APM states, only PM_EVENT_SUSPEND is used; for "Suspend | ||
333 | to Disk" (STD, hibernate, ACPI S4), all of those event codes are used. | ||
334 | |||
335 | There's also PM_EVENT_ON, a value which never appears as a suspend event | ||
336 | but is sometimes used to record the "not suspended" device state. | ||
337 | |||
338 | |||
339 | Resuming Devices | ||
340 | ---------------- | ||
341 | Resuming is done in multiple phases, much like suspending, with all | ||
342 | devices processing each phase's calls before the next phase begins. | ||
343 | |||
344 | The phases are seen by driver notifications issued in this order: | ||
345 | |||
346 | 1 bus.resume_early(dev) is called with IRQs disabled, and with | ||
347 | only one CPU active. As with bus.suspend_late(), this method | ||
348 | won't be supported on busses that require IRQs in order to | ||
349 | interact with devices. | ||
350 | |||
351 | This reverses the effects of bus.suspend_late(). | ||
352 | |||
353 | 2 bus.resume(dev) is called next. This may be morphed into a device | ||
354 | driver call with bus-specific parameters; implementations may sleep. | ||
355 | |||
356 | This reverses the effects of bus.suspend(). | ||
357 | |||
358 | 3 class.resume(dev) is called for devices associated with a class | ||
359 | that has such a method. Implementations may sleep. | ||
360 | |||
361 | This reverses the effects of class.suspend(), and would usually | ||
362 | reactivate the device's I/O queue. | ||
363 | |||
364 | At the end of those phases, drivers should normally be as functional as | ||
365 | they were before suspending: I/O can be performed using DMA and IRQs, and | ||
366 | the relevant clocks are gated on. The device need not be "fully on"; it | ||
367 | might be in a runtime lowpower/suspend state that acts as if it were. | ||
368 | |||
369 | However, the details here may again be platform-specific. For example, | ||
370 | some systems support multiple "run" states, and the mode in effect at | ||
371 | the end of resume() might not be the one which preceded suspension. | ||
372 | That means availability of certain clocks or power supplies changed, | ||
373 | which could easily affect how a driver works. | ||
374 | |||
375 | |||
376 | Drivers need to be able to handle hardware which has been reset since the | ||
377 | suspend methods were called, for example by complete reinitialization. | ||
378 | This may be the hardest part, and the one most protected by NDA'd documents | ||
379 | and chip errata. It's simplest if the hardware state hasn't changed since | ||
380 | the suspend() was called, but that can't always be guaranteed. | ||
381 | |||
382 | Drivers must also be prepared to notice that the device has been removed | ||
383 | while the system was powered off, whenever that's physically possible. | ||
384 | PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses | ||
385 | where common Linux platforms will see such removal. Details of how drivers | ||
386 | will notice and handle such removals are currently bus-specific, and often | ||
387 | involve a separate thread. | ||
13 | 388 | ||
14 | The methods to suspend and resume devices reside in struct bus_type: | ||
15 | 389 | ||
16 | struct bus_type { | 390 | Note that the bus-specific runtime PM wakeup mechanism can exist, and might |
17 | ... | 391 | be defined to share some of the same driver code as for system wakeup. For |
18 | int (*suspend)(struct device * dev, pm_message_t state); | 392 | example, a bus-specific device driver's resume() method might be used there, |
19 | int (*resume)(struct device * dev); | 393 | so it wouldn't only be called from bus.resume() during system-wide wakeup. |
20 | }; | 394 | See bus-specific information about how runtime wakeup events are handled. |
21 | 395 | ||
22 | Each bus driver is responsible implementing these methods, translating | ||
23 | the call into a bus-specific request and forwarding the call to the | ||
24 | bus-specific drivers. For example, PCI drivers implement suspend() and | ||
25 | resume() methods in struct pci_driver. The PCI core is simply | ||
26 | responsible for translating the pointers to PCI-specific ones and | ||
27 | calling the low-level driver. | ||
28 | |||
29 | This is done to a) ease transition to the new power management methods | ||
30 | and leverage the existing PM code in various bus drivers; b) allow | ||
31 | buses to implement generic and default PM routines for devices, and c) | ||
32 | make the flow of execution obvious to the reader. | ||
33 | |||
34 | |||
35 | System Power Management | ||
36 | |||
37 | When the system enters a low-power state, the device tree is walked in | ||
38 | a depth-first fashion to transition each device into a low-power | ||
39 | state. The ordering of the device tree is guaranteed by the order in | ||
40 | which devices get registered - children are never registered before | ||
41 | their ancestors, and devices are placed at the back of the list when | ||
42 | registered. By walking the list in reverse order, we are guaranteed to | ||
43 | suspend devices in the proper order. | ||
44 | |||
45 | Devices are suspended once with interrupts enabled. Drivers are | ||
46 | expected to stop I/O transactions, save device state, and place the | ||
47 | device into a low-power state. Drivers may sleep, allocate memory, | ||
48 | etc. at will. | ||
49 | |||
50 | Some devices are broken and will inevitably have problems powering | ||
51 | down or disabling themselves with interrupts enabled. For these | ||
52 | special cases, they may return -EAGAIN. This will put the device on a | ||
53 | list to be taken care of later. When interrupts are disabled, before | ||
54 | we enter the low-power state, their drivers are called again to put | ||
55 | their device to sleep. | ||
56 | |||
57 | On resume, the devices that returned -EAGAIN will be called to power | ||
58 | themselves back on with interrupts disabled. Once interrupts have been | ||
59 | re-enabled, the rest of the drivers will be called to resume their | ||
60 | devices. On resume, a driver is responsible for powering back on each | ||
61 | device, restoring state, and re-enabling I/O transactions for that | ||
62 | device. | ||
63 | 396 | ||
397 | System Devices | ||
398 | -------------- | ||
64 | System devices follow a slightly different API, which can be found in | 399 | System devices follow a slightly different API, which can be found in |
65 | 400 | ||
66 | include/linux/sysdev.h | 401 | include/linux/sysdev.h |
67 | drivers/base/sys.c | 402 | drivers/base/sys.c |
68 | 403 | ||
69 | System devices will only be suspended with interrupts disabled, and | 404 | System devices will only be suspended with interrupts disabled, and after |
70 | after all other devices have been suspended. On resume, they will be | 405 | all other devices have been suspended. On resume, they will be resumed |
71 | resumed before any other devices, and also with interrupts disabled. | 406 | before any other devices, and also with interrupts disabled. |
72 | 407 | ||
408 | That is, IRQs are disabled, the suspend_late() phase begins, then the | ||
409 | sysdev_driver.suspend() phase, and the system enters a sleep state. Then | ||
410 | the sysdev_driver.resume() phase begins, followed by the resume_early() | ||
411 | phase, after which IRQs are enabled. | ||
73 | 412 | ||
74 | Runtime Power Management | 413 | Code to actually enter and exit the system-wide low power state sometimes |
75 | 414 | involves hardware details that are only known to the boot firmware, and | |
76 | Many devices are able to dynamically power down while the system is | 415 | may leave a CPU running software (from SRAM or flash memory) that monitors |
77 | still running. This feature is useful for devices that are not being | 416 | the system and manages its wakeup sequence. |
78 | used, and can offer significant power savings on a running system. | ||
79 | |||
80 | In each device's directory, there is a 'power' directory, which | ||
81 | contains at least a 'state' file. Reading from this file displays what | ||
82 | power state the device is currently in. Writing to this file initiates | ||
83 | a transition to the specified power state, which must be a decimal in | ||
84 | the range 1-3, inclusive; or 0 for 'On'. | ||
85 | 417 | ||
86 | The PM core will call the ->suspend() method in the bus_type object | ||
87 | that the device belongs to if the specified state is not 0, or | ||
88 | ->resume() if it is. | ||
89 | 418 | ||
90 | Nothing will happen if the specified state is the same state the | 419 | Runtime Power Management |
91 | device is currently in. | 420 | ======================== |
92 | 421 | Many devices are able to dynamically power down while the system is still | |
93 | If the device is already in a low-power state, and the specified state | 422 | running. This feature is useful for devices that are not being used, and |
94 | is another, but different, low-power state, the ->resume() method will | 423 | can offer significant power savings on a running system. These devices |
95 | first be called to power the device back on, then ->suspend() will be | 424 | often support a range of runtime power states, which might use names such |
96 | called again with the new state. | 425 | as "off", "sleep", "idle", "active", and so on. Those states will in some |
97 | 426 | cases (like PCI) be partially constrained by a bus the device uses, and will | |
98 | The driver is responsible for saving the working state of the device | 427 | usually include hardware states that are also used in system sleep states. |
99 | and putting it into the low-power state specified. If this was | 428 | |
100 | successful, it returns 0, and the device's power_state field is | 429 | However, note that if a driver puts a device into a runtime low power state |
101 | updated. | 430 | and the system then goes into a system-wide sleep state, it normally ought |
102 | 431 | to resume into that runtime low power state rather than "full on". Such | |
103 | The driver must take care to know whether or not it is able to | 432 | distinctions would be part of the driver-internal state machine for that |
104 | properly resume the device, including all step of reinitialization | 433 | hardware; the whole point of runtime power management is to be sure that |
105 | necessary. (This is the hardest part, and the one most protected by | 434 | drivers are decoupled in that way from the state machine governing phases |
106 | NDA'd documents). | 435 | of the system-wide power/sleep state transitions. |
107 | 436 | ||
108 | The driver must also take care not to suspend a device that is | 437 | |
109 | currently in use. It is their responsibility to provide their own | 438 | Power Saving Techniques |
110 | exclusion mechanisms. | 439 | ----------------------- |
111 | 440 | Normally runtime power management is handled by the drivers without specific | |
112 | The runtime power transition happens with interrupts enabled. If a | 441 | userspace or kernel intervention, by device-aware use of techniques like: |
113 | device cannot support being powered down with interrupts, it may | 442 | |
114 | return -EAGAIN (as it would during a system power management | 443 | Using information provided by other system layers |
115 | transition), but it will _not_ be called again, and the transaction | 444 | - stay deeply "off" except between open() and close() |
116 | will fail. | 445 | - if transceiver/PHY indicates "nobody connected", stay "off" |
117 | 446 | - application protocols may include power commands or hints | |
118 | There is currently no way to know what states a device or driver | 447 | |
119 | supports a priori. This will change in the future. | 448 | Using fewer CPU cycles |
120 | 449 | - using DMA instead of PIO | |
121 | pm_message_t meaning | 450 | - removing timers, or making them lower frequency |
122 | 451 | - shortening "hot" code paths | |
123 | pm_message_t has two fields. event ("major"), and flags. If driver | 452 | - eliminating cache misses |
124 | does not know event code, it aborts the request, returning error. Some | 453 | - (sometimes) offloading work to device firmware |
125 | drivers may need to deal with special cases based on the actual type | 454 | |
126 | of suspend operation being done at the system level. This is why | 455 | Reducing other resource costs |
127 | there are flags. | 456 | - gating off unused clocks in software (or hardware) |
128 | 457 | - switching off unused power supplies | |
129 | Event codes are: | 458 | - eliminating (or delaying/merging) IRQs |
130 | 459 | - tuning DMA to use word and/or burst modes | |
131 | ON -- no need to do anything except special cases like broken | 460 | |
132 | HW. | 461 | Using device-specific low power states |
133 | 462 | - using lower voltages | |
134 | # NOTIFICATION -- pretty much same as ON? | 463 | - avoiding needless DMA transfers |
135 | 464 | ||
136 | FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from | 465 | Read your hardware documentation carefully to see the opportunities that |
137 | scratch. That probably means stop accepting upstream requests, the | 466 | may be available. If you can, measure the actual power usage and check |
138 | actual policy of what to do with them being specific to a given | 467 | it against the budget established for your project. |
139 | driver. It's acceptable for a network driver to just drop packets | 468 | |
140 | while a block driver is expected to block the queue so no request is | 469 | |
141 | lost. (Use IDE as an example on how to do that). FREEZE requires no | 470 | Examples: USB hosts, system timer, system CPU |
142 | power state change, and it's expected for drivers to be able to | 471 | ---------------------------------------------- |
143 | quickly transition back to operating state. | 472 | USB host controllers make interesting, if complex, examples. In many cases |
144 | 473 | these have no work to do: no USB devices are connected, or all of them are | |
145 | SUSPEND -- like FREEZE, but also put hardware into low-power state. If | 474 | in the USB "suspend" state. Linux host controller drivers can then disable |
146 | there's need to distinguish several levels of sleep, additional flag | 475 | periodic DMA transfers that would otherwise be a constant power drain on the |
147 | is probably best way to do that. | 476 | memory subsystem, and enter a suspend state. In power-aware controllers, |
148 | 477 | entering that suspend state may disable the clock used with USB signaling, | |
149 | Transitions are only from a resumed state to a suspended state, never | 478 | saving a certain amount of power. |
150 | between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, | 479 | |
151 | FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). | 480 | The controller will be woken from that state (with an IRQ) by changes to the |
152 | 481 | signal state on the data lines of a given port, for example by an existing | |
153 | All events are: | 482 | peripheral requesting "remote wakeup" or by plugging a new peripheral. The |
154 | 483 | same wakeup mechanism usually works from "standby" sleep states, and on some | |
155 | [NOTE NOTE NOTE: If you are driver author, you should not care; you | 484 | systems also from "suspend to RAM" (or even "suspend to disk") states. |
156 | should only look at event, and ignore flags.] | 485 | (Except that ACPI may be involved instead of normal IRQs, on some hardware.) |
157 | 486 | ||
158 | #Prepare for suspend -- userland is still running but we are going to | 487 | System devices like timers and CPUs may have special roles in the platform |
159 | #enter suspend state. This gives drivers chance to load firmware from | 488 | power management scheme. For example, system timers using a "dynamic tick" |
160 | #disk and store it in memory, or do other activities taht require | 489 | approach don't just save CPU cycles (by eliminating needless timer IRQs), |
161 | #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these | 490 | but they may also open the door to using lower power CPU "idle" states that |
162 | #are forbiden once the suspend dance is started.. event = ON, flags = | 491 | cost more than a jiffie to enter and exit. On x86 systems these are states |
163 | #PREPARE_TO_SUSPEND | 492 | like "C3"; note that periodic DMA transfers from a USB host controller will |
164 | 493 | also prevent entry to a C3 state, much like a periodic timer IRQ. | |
165 | Apm standby -- prepare for APM event. Quiesce devices to make life | 494 | |
166 | easier for APM BIOS. event = FREEZE, flags = APM_STANDBY | 495 | That kind of runtime mechanism interaction is common. "System On Chip" (SOC) |
167 | 496 | processors often have low power idle modes that can't be entered unless | |
168 | Apm suspend -- same as APM_STANDBY, but it we should probably avoid | 497 | certain medium-speed clocks (often 12 or 48 MHz) are gated off. When the |
169 | spinning down disks. event = FREEZE, flags = APM_SUSPEND | 498 | drivers gate those clocks effectively, then the system idle task may be able |
170 | 499 | to use the lower power idle modes and thereby increase battery life. | |
171 | System halt, reboot -- quiesce devices to make life easier for BIOS. event | 500 | |
172 | = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT | 501 | If the CPU can have a "cpufreq" driver, there also may be opportunities |
173 | 502 | to shift to lower voltage settings and reduce the power cost of executing | |
174 | System shutdown -- at least disks need to be spun down, or data may be | 503 | a given number of instructions. (Without voltage adjustment, it's rare |
175 | lost. Quiesce devices, just to make life easier for BIOS. event = | 504 | for cpufreq to save much power; the cost-per-instruction must go down.) |
176 | FREEZE, flags = SYSTEM_SHUTDOWN | 505 | |
177 | 506 | ||
178 | Kexec -- turn off DMAs and put hardware into some state where new | 507 | /sys/devices/.../power/state files |
179 | kernel can take over. event = FREEZE, flags = KEXEC | 508 | ================================== |
180 | 509 | For now you can also test some of this functionality using sysfs. | |
181 | Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake | 510 | |
182 | may need to be enabled on some devices. This actually has at least 3 | 511 | DEPRECATED: USE "power/state" ONLY FOR DRIVER TESTING, AND |
183 | subtypes, system can reboot, enter S4 and enter S5 at the end of | 512 | AVOID USING dev->power.power_state IN DRIVERS. |
184 | swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, | 513 | |
185 | SYSTEM_SHUTDOWN, SYSTEM_S4 | 514 | THESE WILL BE REMOVED. IF THE "power/state" FILE GETS REPLACED, |
186 | 515 | IT WILL BECOME SOMETHING COUPLED TO THE BUS OR DRIVER. | |
187 | Suspend to ram -- put devices into low power state. event = SUSPEND, | 516 | |
188 | flags = SUSPEND_TO_RAM | 517 | In each device's directory, there is a 'power' directory, which contains |
189 | 518 | at least a 'state' file. The value of this field is effectively boolean, | |
190 | Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put | 519 | PM_EVENT_ON or PM_EVENT_SUSPEND. |
191 | devices into low power mode, but you must be able to reinitialize | 520 | |
192 | device from scratch in resume method. This has two flavors, its done | 521 | * Reading from this file displays a value corresponding to |
193 | once on suspending kernel, once on resuming kernel. event = FREEZE, | 522 | the power.power_state.event field. All nonzero values are |
194 | flags = DURING_SUSPEND or DURING_RESUME | 523 | displayed as "2", corresponding to a low power state; zero |
195 | 524 | is displayed as "0", corresponding to normal operation. | |
196 | Device detach requested from /sys -- deinitialize device; proably same as | 525 | |
197 | SYSTEM_SHUTDOWN, I do not understand this one too much. probably event | 526 | * Writing to this file initiates a transition using the |
198 | = FREEZE, flags = DEV_DETACH. | 527 | specified event code number; only '0', '2', and '3' are |
199 | 528 | accepted (without a newline); '2' and '3' are both | |
200 | #These are not really events sent: | 529 | mapped to PM_EVENT_SUSPEND. |
201 | # | 530 | |
202 | #System fully on -- device is working normally; this is probably never | 531 | On writes, the PM core relies on that recorded event code and the device/bus |
203 | #passed to suspend() method... event = ON, flags = 0 | 532 | capabilities to determine whether it uses a partial suspend() or resume() |
204 | # | 533 | sequence to change things so that the recorded event corresponds to the |
205 | #Ready after resume -- userland is now running, again. Time to free any | 534 | numeric parameter. |
206 | #memory you ate during prepare to suspend... event = ON, flags = | 535 | |
207 | #READY_AFTER_RESUME | 536 | - If the bus requires the irqs-disabled suspend_late()/resume_early() |
208 | # | 537 | phases, writes fail because those operations are not supported here. |
538 | |||
539 | - If the recorded value is the expected value, nothing is done. | ||
540 | |||
541 | - If the recorded value is nonzero, the device is partially resumed, | ||
542 | using the bus.resume() and/or class.resume() methods. | ||
543 | |||
544 | - If the target value is nonzero, the device is partially suspended, | ||
545 | using the class.suspend() and/or bus.suspend() methods and the | ||
546 | PM_EVENT_SUSPEND message. | ||
547 | |||
548 | Drivers have no way to tell whether their suspend() and resume() calls | ||
549 | have come through the sysfs power/state file or as part of entering a | ||
550 | system sleep state, except that when accessed through sysfs the normal | ||
551 | parent/child sequencing rules are ignored. Drivers (such as bus, bridge, | ||
552 | or hub drivers) which expose child devices may need to enforce those rules | ||
553 | on their own. | ||
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt index 4117802af0f8..a66bec222b16 100644 --- a/Documentation/power/interface.txt +++ b/Documentation/power/interface.txt | |||
@@ -52,3 +52,18 @@ suspend image will be as small as possible. | |||
52 | 52 | ||
53 | Reading from this file will display the current image size limit, which | 53 | Reading from this file will display the current image size limit, which |
54 | is set to 500 MB by default. | 54 | is set to 500 MB by default. |
55 | |||
56 | /sys/power/pm_trace controls the code which saves the last PM event point in | ||
57 | the RTC across reboots, so that you can debug a machine that just hangs | ||
58 | during suspend (or more commonly, during resume). Namely, the RTC is only | ||
59 | used to save the last PM event point if this file contains '1'. Initially it | ||
60 | contains '0' which may be changed to '1' by writing a string representing a | ||
61 | nonzero integer into it. | ||
62 | |||
63 | To use this debugging feature you should attempt to suspend the machine, then | ||
64 | reboot it and run | ||
65 | |||
66 | dmesg -s 1000000 | grep 'hash matches' | ||
67 | |||
68 | CAUTION: Using it will cause your machine's real-time (CMOS) clock to be | ||
69 | set to a random invalid time after a resume. | ||
diff --git a/Documentation/sh/new-machine.txt b/Documentation/sh/new-machine.txt index eb2dd2e6993b..73988e0d112b 100644 --- a/Documentation/sh/new-machine.txt +++ b/Documentation/sh/new-machine.txt | |||
@@ -41,11 +41,6 @@ Board-specific code: | |||
41 | | | 41 | | |
42 | .. more boards here ... | 42 | .. more boards here ... |
43 | 43 | ||
44 | It should also be noted that each board is required to have some certain | ||
45 | headers. At the time of this writing, io.h is the only thing that needs | ||
46 | to be provided for each board, and can generally just reference generic | ||
47 | functions (with the exception of isa_port2addr). | ||
48 | |||
49 | Next, for companion chips: | 44 | Next, for companion chips: |
50 | . | 45 | . |
51 | `-- arch | 46 | `-- arch |
@@ -104,12 +99,13 @@ and then populate that with sub-directories for each member of the family. | |||
104 | Both the Solution Engine and the hp6xx boards are an example of this. | 99 | Both the Solution Engine and the hp6xx boards are an example of this. |
105 | 100 | ||
106 | After you have setup your new arch/sh/boards/ directory, remember that you | 101 | After you have setup your new arch/sh/boards/ directory, remember that you |
107 | also must add a directory in include/asm-sh for headers localized to this | 102 | should also add a directory in include/asm-sh for headers localized to this |
108 | board. In order to interoperate seamlessly with the build system, it's best | 103 | board (if there are going to be more than one). In order to interoperate |
109 | to have this directory the same as the arch/sh/boards/ directory name, | 104 | seamlessly with the build system, it's best to have this directory the same |
110 | though if your board is again part of a family, the build system has ways | 105 | as the arch/sh/boards/ directory name, though if your board is again part of |
111 | of dealing with this, and you can feel free to name the directory after | 106 | a family, the build system has ways of dealing with this (via incdir-y |
112 | the family member itself. | 107 | overloading), and you can feel free to name the directory after the family |
108 | member itself. | ||
113 | 109 | ||
114 | There are a few things that each board is required to have, both in the | 110 | There are a few things that each board is required to have, both in the |
115 | arch/sh/boards and the include/asm-sh/ heirarchy. In order to better | 111 | arch/sh/boards and the include/asm-sh/ heirarchy. In order to better |
@@ -122,6 +118,7 @@ might look something like: | |||
122 | * arch/sh/boards/vapor/setup.c - Setup code for imaginary board | 118 | * arch/sh/boards/vapor/setup.c - Setup code for imaginary board |
123 | */ | 119 | */ |
124 | #include <linux/init.h> | 120 | #include <linux/init.h> |
121 | #include <asm/rtc.h> /* for board_time_init() */ | ||
125 | 122 | ||
126 | const char *get_system_type(void) | 123 | const char *get_system_type(void) |
127 | { | 124 | { |
@@ -152,79 +149,57 @@ int __init platform_setup(void) | |||
152 | } | 149 | } |
153 | 150 | ||
154 | Our new imaginary board will also have to tie into the machvec in order for it | 151 | Our new imaginary board will also have to tie into the machvec in order for it |
155 | to be of any use. Currently the machvec is slowly on its way out, but is still | 152 | to be of any use. |
156 | required for the time being. As such, let us take a look at what needs to be | ||
157 | done for the machvec assignment. | ||
158 | 153 | ||
159 | machvec functions fall into a number of categories: | 154 | machvec functions fall into a number of categories: |
160 | 155 | ||
161 | - I/O functions to IO memory (inb etc) and PCI/main memory (readb etc). | 156 | - I/O functions to IO memory (inb etc) and PCI/main memory (readb etc). |
162 | - I/O remapping functions (ioremap etc) | 157 | - I/O mapping functions (ioport_map, ioport_unmap, etc). |
163 | - some initialisation functions | 158 | - a 'heartbeat' function. |
164 | - a 'heartbeat' function | 159 | - PCI and IRQ initialization routines. |
165 | - some miscellaneous flags | 160 | - Consistent allocators (for boards that need special allocators, |
166 | 161 | particularly for allocating out of some board-specific SRAM for DMA | |
167 | The tree can be built in two ways: | 162 | handles). |
168 | - as a fully generic build. All drivers are linked in, and all functions | 163 | |
169 | go through the machvec | 164 | There are machvec functions added and removed over time, so always be sure to |
170 | - as a machine specific build. In this case only the required drivers | 165 | consult include/asm-sh/machvec.h for the current state of the machvec. |
171 | will be linked in, and some macros may be redefined to not go through | 166 | |
172 | the machvec where performance is important (in particular IO functions). | 167 | The kernel will automatically wrap in generic routines for undefined function |
173 | 168 | pointers in the machvec at boot time, as machvec functions are referenced | |
174 | There are three ways in which IO can be performed: | 169 | unconditionally throughout most of the tree. Some boards have incredibly |
175 | - none at all. This is really only useful for the 'unknown' machine type, | 170 | sparse machvecs (such as the dreamcast and sh03), whereas others must define |
176 | which us designed to run on a machine about which we know nothing, and | 171 | virtually everything (rts7751r2d). |
177 | so all all IO instructions do nothing. | 172 | |
178 | - fully custom. In this case all IO functions go to a machine specific | 173 | Adding a new machine is relatively trivial (using vapor as an example): |
179 | set of functions which can do what they like | 174 | |
180 | - a generic set of functions. These will cope with most situations, | 175 | If the board-specific definitions are quite minimalistic, as is the case for |
181 | and rely on a single function, mv_port2addr, which is called through the | 176 | the vast majority of boards, simply having a single board-specific header is |
182 | machine vector, and converts an IO address into a memory address, which | 177 | sufficient. |
183 | can be read from/written to directly. | 178 | |
184 | 179 | - add a new file include/asm-sh/vapor.h which contains prototypes for | |
185 | Thus adding a new machine involves the following steps (I will assume I am | ||
186 | adding a machine called vapor): | ||
187 | |||
188 | - add a new file include/asm-sh/vapor/io.h which contains prototypes for | ||
189 | any machine specific IO functions prefixed with the machine name, for | 180 | any machine specific IO functions prefixed with the machine name, for |
190 | example vapor_inb. These will be needed when filling out the machine | 181 | example vapor_inb. These will be needed when filling out the machine |
191 | vector. | 182 | vector. |
192 | 183 | ||
193 | This is the minimum that is required, however there are ample | 184 | Note that these prototypes are generated automatically by setting |
194 | opportunities to optimise this. In particular, by making the prototypes | 185 | __IO_PREFIX to something sensible. A typical example would be: |
195 | inline function definitions, it is possible to inline the function when | 186 | |
196 | building machine specific versions. Note that the machine vector | 187 | #define __IO_PREFIX vapor |
197 | functions will still be needed, so that a module built for a generic | 188 | #include <asm/io_generic.h> |
198 | setup can be loaded. | 189 | |
199 | 190 | somewhere in the board-specific header. Any boards being ported that still | |
200 | - add a new file arch/sh/boards/vapor/mach.c. This contains the definition | 191 | have a legacy io.h should remove it entirely and switch to the new model. |
201 | of the machine vector. When building the machine specific version, this | 192 | |
202 | will be the real machine vector (via an alias), while in the generic | 193 | - Add machine vector definitions to the board's setup.c. At a bare minimum, |
203 | version is used to initialise the machine vector, and then freed, by | 194 | this must be defined as something like: |
204 | making it initdata. This should be defined as: | 195 | |
205 | 196 | struct sh_machine_vector mv_vapor __initmv = { | |
206 | struct sh_machine_vector mv_vapor __initmv = { | 197 | .mv_name = "vapor", |
207 | .mv_name = "vapor", | 198 | }; |
208 | } | 199 | ALIAS_MV(vapor) |
209 | ALIAS_MV(vapor) | 200 | |
210 | 201 | - finally add a file arch/sh/boards/vapor/io.c, which contains definitions of | |
211 | - finally add a file arch/sh/boards/vapor/io.c, which contains | 202 | the machine specific io functions (if there are enough to warrant it). |
212 | definitions of the machine specific io functions. | ||
213 | |||
214 | A note about initialisation functions. Three initialisation functions are | ||
215 | provided in the machine vector: | ||
216 | - mv_arch_init - called very early on from setup_arch | ||
217 | - mv_init_irq - called from init_IRQ, after the generic SH interrupt | ||
218 | initialisation | ||
219 | - mv_init_pci - currently not used | ||
220 | |||
221 | Any other remaining functions which need to be called at start up can be | ||
222 | added to the list using the __initcalls macro (or module_init if the code | ||
223 | can be built as a module). Many generic drivers probe to see if the device | ||
224 | they are targeting is present, however this may not always be appropriate, | ||
225 | so a flag can be added to the machine vector which will be set on those | ||
226 | machines which have the hardware in question, reducing the probe to a | ||
227 | single conditional. | ||
228 | 203 | ||
229 | 3. Hooking into the Build System | 204 | 3. Hooking into the Build System |
230 | ================================ | 205 | ================================ |
@@ -303,4 +278,3 @@ which will in turn copy the defconfig for this board, run it through | |||
303 | oldconfig (prompting you for any new options since the time of creation), | 278 | oldconfig (prompting you for any new options since the time of creation), |
304 | and start you on your way to having a functional kernel for your new | 279 | and start you on your way to having a functional kernel for your new |
305 | board. | 280 | board. |
306 | |||
diff --git a/Documentation/sh/register-banks.txt b/Documentation/sh/register-banks.txt new file mode 100644 index 000000000000..a6719f2f6594 --- /dev/null +++ b/Documentation/sh/register-banks.txt | |||
@@ -0,0 +1,33 @@ | |||
1 | Notes on register bank usage in the kernel | ||
2 | ========================================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The SH-3 and SH-4 CPU families traditionally include a single partial register | ||
8 | bank (selected by SR.RB, only r0 ... r7 are banked), whereas other families | ||
9 | may have more full-featured banking or simply no such capabilities at all. | ||
10 | |||
11 | SR.RB banking | ||
12 | ------------- | ||
13 | |||
14 | In the case of this type of banking, banked registers are mapped directly to | ||
15 | r0 ... r7 if SR.RB is set to the bank we are interested in, otherwise ldc/stc | ||
16 | can still be used to reference the banked registers (as r0_bank ... r7_bank) | ||
17 | when in the context of another bank. The developer must keep the SR.RB value | ||
18 | in mind when writing code that utilizes these banked registers, for obvious | ||
19 | reasons. Userspace is also not able to poke at the bank1 values, so these can | ||
20 | be used rather effectively as scratch registers by the kernel. | ||
21 | |||
22 | Presently the kernel uses several of these registers. | ||
23 | |||
24 | - r0_bank, r1_bank (referenced as k0 and k1, used for scratch | ||
25 | registers when doing exception handling). | ||
26 | - r2_bank (used to track the EXPEVT/INTEVT code) | ||
27 | - Used by do_IRQ() and friends for doing irq mapping based off | ||
28 | of the interrupt exception vector jump table offset | ||
29 | - r6_bank (global interrupt mask) | ||
30 | - The SR.IMASK interrupt handler makes use of this to set the | ||
31 | interrupt priority level (used by local_irq_enable()) | ||
32 | - r7_bank (current) | ||
33 | |||
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 7cee90223d3a..20d0d797f539 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm: | |||
29 | - drop-caches | 29 | - drop-caches |
30 | - zone_reclaim_mode | 30 | - zone_reclaim_mode |
31 | - min_unmapped_ratio | 31 | - min_unmapped_ratio |
32 | - min_slab_ratio | ||
32 | - panic_on_oom | 33 | - panic_on_oom |
33 | 34 | ||
34 | ============================================================== | 35 | ============================================================== |
@@ -138,7 +139,6 @@ This is value ORed together of | |||
138 | 1 = Zone reclaim on | 139 | 1 = Zone reclaim on |
139 | 2 = Zone reclaim writes dirty pages out | 140 | 2 = Zone reclaim writes dirty pages out |
140 | 4 = Zone reclaim swaps pages | 141 | 4 = Zone reclaim swaps pages |
141 | 8 = Also do a global slab reclaim pass | ||
142 | 142 | ||
143 | zone_reclaim_mode is set during bootup to 1 if it is determined that pages | 143 | zone_reclaim_mode is set during bootup to 1 if it is determined that pages |
144 | from remote zones will cause a measurable performance reduction. The | 144 | from remote zones will cause a measurable performance reduction. The |
@@ -162,18 +162,13 @@ Allowing regular swap effectively restricts allocations to the local | |||
162 | node unless explicitly overridden by memory policies or cpuset | 162 | node unless explicitly overridden by memory policies or cpuset |
163 | configurations. | 163 | configurations. |
164 | 164 | ||
165 | It may be advisable to allow slab reclaim if the system makes heavy | ||
166 | use of files and builds up large slab caches. However, the slab | ||
167 | shrink operation is global, may take a long time and free slabs | ||
168 | in all nodes of the system. | ||
169 | |||
170 | ============================================================= | 165 | ============================================================= |
171 | 166 | ||
172 | min_unmapped_ratio: | 167 | min_unmapped_ratio: |
173 | 168 | ||
174 | This is available only on NUMA kernels. | 169 | This is available only on NUMA kernels. |
175 | 170 | ||
176 | A percentage of the file backed pages in each zone. Zone reclaim will only | 171 | A percentage of the total pages in each zone. Zone reclaim will only |
177 | occur if more than this percentage of pages are file backed and unmapped. | 172 | occur if more than this percentage of pages are file backed and unmapped. |
178 | This is to insure that a minimal amount of local pages is still available for | 173 | This is to insure that a minimal amount of local pages is still available for |
179 | file I/O even if the node is overallocated. | 174 | file I/O even if the node is overallocated. |
@@ -182,6 +177,24 @@ The default is 1 percent. | |||
182 | 177 | ||
183 | ============================================================= | 178 | ============================================================= |
184 | 179 | ||
180 | min_slab_ratio: | ||
181 | |||
182 | This is available only on NUMA kernels. | ||
183 | |||
184 | A percentage of the total pages in each zone. On Zone reclaim | ||
185 | (fallback from the local zone occurs) slabs will be reclaimed if more | ||
186 | than this percentage of pages in a zone are reclaimable slab pages. | ||
187 | This insures that the slab growth stays under control even in NUMA | ||
188 | systems that rarely perform global reclaim. | ||
189 | |||
190 | The default is 5 percent. | ||
191 | |||
192 | Note that slab reclaim is triggered in a per zone / node fashion. | ||
193 | The process of reclaiming slab memory is currently not node specific | ||
194 | and may not be fast. | ||
195 | |||
196 | ============================================================= | ||
197 | |||
185 | panic_on_oom | 198 | panic_on_oom |
186 | 199 | ||
187 | This enables or disables panic on out-of-memory feature. If this is set to 1, | 200 | This enables or disables panic on out-of-memory feature. If this is set to 1, |
diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt index 867f4c38f356..39c68f8c4e6c 100644 --- a/Documentation/usb/error-codes.txt +++ b/Documentation/usb/error-codes.txt | |||
@@ -98,13 +98,13 @@ one or more packets could finish before an error stops further endpoint I/O. | |||
98 | error, a failure to respond (often caused by | 98 | error, a failure to respond (often caused by |
99 | device disconnect), or some other fault. | 99 | device disconnect), or some other fault. |
100 | 100 | ||
101 | -ETIMEDOUT (**) No response packet received within the prescribed | 101 | -ETIME (**) No response packet received within the prescribed |
102 | bus turn-around time. This error may instead be | 102 | bus turn-around time. This error may instead be |
103 | reported as -EPROTO or -EILSEQ. | 103 | reported as -EPROTO or -EILSEQ. |
104 | 104 | ||
105 | Note that the synchronous USB message functions | 105 | -ETIMEDOUT Synchronous USB message functions use this code |
106 | also use this code to indicate timeout expired | 106 | to indicate timeout expired before the transfer |
107 | before the transfer completed. | 107 | completed, and no other error was reported by HC. |
108 | 108 | ||
109 | -EPIPE (**) Endpoint stalled. For non-control endpoints, | 109 | -EPIPE (**) Endpoint stalled. For non-control endpoints, |
110 | reset this status with usb_clear_halt(). | 110 | reset this status with usb_clear_halt(). |
@@ -163,6 +163,3 @@ usb_get_*/usb_set_*(): | |||
163 | usb_control_msg(): | 163 | usb_control_msg(): |
164 | usb_bulk_msg(): | 164 | usb_bulk_msg(): |
165 | -ETIMEDOUT Timeout expired before the transfer completed. | 165 | -ETIMEDOUT Timeout expired before the transfer completed. |
166 | In the future this code may change to -ETIME, | ||
167 | whose definition is a closer match to this sort | ||
168 | of error. | ||
diff --git a/Documentation/usb/usb-serial.txt b/Documentation/usb/usb-serial.txt index 02b0f7beb6d1..a2dee6e6190d 100644 --- a/Documentation/usb/usb-serial.txt +++ b/Documentation/usb/usb-serial.txt | |||
@@ -433,6 +433,11 @@ Options supported: | |||
433 | See http://www.uuhaus.de/linux/palmconnect.html for up-to-date | 433 | See http://www.uuhaus.de/linux/palmconnect.html for up-to-date |
434 | information on this driver. | 434 | information on this driver. |
435 | 435 | ||
436 | AIRcable USB Dongle Bluetooth driver | ||
437 | If there is the cdc_acm driver loaded in the system, you will find that the | ||
438 | cdc_acm claims the device before AIRcable can. This is simply corrected | ||
439 | by unloading both modules and then loading the aircable module before | ||
440 | cdc_acm module | ||
436 | 441 | ||
437 | Generic Serial driver | 442 | Generic Serial driver |
438 | 443 | ||
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index 6da24e7a56cb..4303e0c12476 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt | |||
@@ -245,6 +245,13 @@ Debugging | |||
245 | newfallback: use new unwinder but fall back to old if it gets | 245 | newfallback: use new unwinder but fall back to old if it gets |
246 | stuck (default) | 246 | stuck (default) |
247 | 247 | ||
248 | call_trace=[old|both|newfallback|new] | ||
249 | old: use old inexact backtracer | ||
250 | new: use new exact dwarf2 unwinder | ||
251 | both: print entries from both | ||
252 | newfallback: use new unwinder but fall back to old if it gets | ||
253 | stuck (default) | ||
254 | |||
248 | Misc | 255 | Misc |
249 | 256 | ||
250 | noreplacement Don't replace instructions with more appropriate ones | 257 | noreplacement Don't replace instructions with more appropriate ones |
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks new file mode 100644 index 000000000000..bddfddd466ab --- /dev/null +++ b/Documentation/x86_64/kernel-stacks | |||
@@ -0,0 +1,99 @@ | |||
1 | Most of the text from Keith Owens, hacked by AK | ||
2 | |||
3 | x86_64 page size (PAGE_SIZE) is 4K. | ||
4 | |||
5 | Like all other architectures, x86_64 has a kernel stack for every | ||
6 | active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big. | ||
7 | These stacks contain useful data as long as a thread is alive or a | ||
8 | zombie. While the thread is in user space the kernel stack is empty | ||
9 | except for the thread_info structure at the bottom. | ||
10 | |||
11 | In addition to the per thread stacks, there are specialized stacks | ||
12 | associated with each cpu. These stacks are only used while the kernel | ||
13 | is in control on that cpu, when a cpu returns to user space the | ||
14 | specialized stacks contain no useful data. The main cpu stacks is | ||
15 | |||
16 | * Interrupt stack. IRQSTACKSIZE | ||
17 | |||
18 | Used for external hardware interrupts. If this is the first external | ||
19 | hardware interrupt (i.e. not a nested hardware interrupt) then the | ||
20 | kernel switches from the current task to the interrupt stack. Like | ||
21 | the split thread and interrupt stacks on i386 (with CONFIG_4KSTACKS), | ||
22 | this gives more room for kernel interrupt processing without having | ||
23 | to increase the size of every per thread stack. | ||
24 | |||
25 | The interrupt stack is also used when processing a softirq. | ||
26 | |||
27 | Switching to the kernel interrupt stack is done by software based on a | ||
28 | per CPU interrupt nest counter. This is needed because x86-64 "IST" | ||
29 | hardware stacks cannot nest without races. | ||
30 | |||
31 | x86_64 also has a feature which is not available on i386, the ability | ||
32 | to automatically switch to a new stack for designated events such as | ||
33 | double fault or NMI, which makes it easier to handle these unusual | ||
34 | events on x86_64. This feature is called the Interrupt Stack Table | ||
35 | (IST). There can be up to 7 IST entries per cpu. The IST code is an | ||
36 | index into the Task State Segment (TSS), the IST entries in the TSS | ||
37 | point to dedicated stacks, each stack can be a different size. | ||
38 | |||
39 | An IST is selected by an non-zero value in the IST field of an | ||
40 | interrupt-gate descriptor. When an interrupt occurs and the hardware | ||
41 | loads such a descriptor, the hardware automatically sets the new stack | ||
42 | pointer based on the IST value, then invokes the interrupt handler. If | ||
43 | software wants to allow nested IST interrupts then the handler must | ||
44 | adjust the IST values on entry to and exit from the interrupt handler. | ||
45 | (this is occasionally done, e.g. for debug exceptions) | ||
46 | |||
47 | Events with different IST codes (i.e. with different stacks) can be | ||
48 | nested. For example, a debug interrupt can safely be interrupted by an | ||
49 | NMI. arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack | ||
50 | pointers on entry to and exit from all IST events, in theory allowing | ||
51 | IST events with the same code to be nested. However in most cases, the | ||
52 | stack size allocated to an IST assumes no nesting for the same code. | ||
53 | If that assumption is ever broken then the stacks will become corrupt. | ||
54 | |||
55 | The currently assigned IST stacks are :- | ||
56 | |||
57 | * STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
58 | |||
59 | Used for interrupt 12 - Stack Fault Exception (#SS). | ||
60 | |||
61 | This allows to recover from invalid stack segments. Rarely | ||
62 | happens. | ||
63 | |||
64 | * DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
65 | |||
66 | Used for interrupt 8 - Double Fault Exception (#DF). | ||
67 | |||
68 | Invoked when handling a exception causes another exception. Happens | ||
69 | when the kernel is very confused (e.g. kernel stack pointer corrupt) | ||
70 | Using a separate stack allows to recover from it well enough in many | ||
71 | cases to still output an oops. | ||
72 | |||
73 | * NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
74 | |||
75 | Used for non-maskable interrupts (NMI). | ||
76 | |||
77 | NMI can be delivered at any time, including when the kernel is in the | ||
78 | middle of switching stacks. Using IST for NMI events avoids making | ||
79 | assumptions about the previous state of the kernel stack. | ||
80 | |||
81 | * DEBUG_STACK. DEBUG_STKSZ | ||
82 | |||
83 | Used for hardware debug interrupts (interrupt 1) and for software | ||
84 | debug interrupts (INT3). | ||
85 | |||
86 | When debugging a kernel, debug interrupts (both hardware and | ||
87 | software) can occur at any time. Using IST for these interrupts | ||
88 | avoids making assumptions about the previous state of the kernel | ||
89 | stack. | ||
90 | |||
91 | * MCE_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
92 | |||
93 | Used for interrupt 18 - Machine Check Exception (#MC). | ||
94 | |||
95 | MCE can be delivered at any time, including when the kernel is in the | ||
96 | middle of switching stacks. Using IST for MCE events avoids making | ||
97 | assumptions about the previous state of the kernel stack. | ||
98 | |||
99 | For more details see the Intel IA32 or AMD AMD64 architecture manuals. | ||