37 files changed, 1985 insertions, 581 deletions
diff --git a/Documentation/ABI/obsolete/devfs b/Documentation/ABI/removed/devfs
index b8b87399bc8f..8195c4e0d0a1 100644
--- a/Documentation/ABI/obsolete/devfs
+++ b/Documentation/ABI/removed/devfs
@@ -1,13 +1,12 @@
 What:           devfs
-Date:           July 2005
+Date:           July 2005 (scheduled), finally removed in kernel v2.6.18
 Contact:        Greg Kroah-Hartman <gregkh@suse.de>
 Description:
        devfs has been unmaintained for a number of years, has unfixable
        races, contains a naming policy within the kernel that is
        against the LSB, and can be replaced by using udev.
-        The files fs/devfs/*, include/linux/devfs_fs*.h will be removed,
+        The files fs/devfs/*, include/linux/devfs_fs*.h were removed,
        along with the the assorted devfs function calls throughout the
        kernel tree.
 Users:
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
new file mode 100644
index 000000000000..d882f8093871
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-power
@@ -0,0 +1,88 @@
+What:           /sys/power/
+Date:           August 2006
+Contact:        Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+                The /sys/power directory will contain files that will
+                provide a unified interface to the power management
+                subsystem.
+What:           /sys/power/state
+Date:           August 2006
+Contact:        Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+                The /sys/power/state file controls the system power state.
+                Reading from this file returns what states are supported,
+                which is hard-coded to 'standby' (Power-On Suspend), 'mem'
+                (Suspend-to-RAM), and 'disk' (Suspend-to-Disk).
+                Writing to this file one of these strings causes the system to
+                transition into that state. Please see the file
+                Documentation/power/states.txt for a description of each of
+                these states.
+What:           /sys/power/disk
+Date:           August 2006
+Contact:        Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+                The /sys/power/disk file controls the operating mode of the
+                suspend-to-disk mechanism.  Reading from this file returns
+                the name of the method by which the system will be put to
+                sleep on the next suspend.  There are four methods supported:
+                'firmware' - means that the memory image will be saved to disk
+                by some firmware, in which case we also assume that the
+                firmware will handle the system suspend.
+                'platform' - the memory image will be saved by the kernel and
+                the system will be put to sleep by the platform driver (e.g.
+                ACPI or other PM registers).
+                'shutdown' - the memory image will be saved by the kernel and
+                the system will be powered off.
+                'reboot' - the memory image will be saved by the kernel and
+                the system will be rebooted.
+                The suspend-to-disk method may be chosen by writing to this
+                file one of the accepted strings:
+                'firmware'
+                'platform'
+                'shutdown'
+                'reboot'
+                It will only change to 'firmware' or 'platform' if the system
+                supports that.
+What:           /sys/power/image_size
+Date:           August 2006
+Contact:        Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+                The /sys/power/image_size file controls the size of the image
+                created by the suspend-to-disk mechanism.  It can be written a
+                string representing a non-negative integer that will be used
+                as an upper limit of the image size, in bytes.  The kernel's
+                suspend-to-disk code will do its best to ensure the image size
+                will not exceed this number.  However, if it turns out to be
+                impossible, the kernel will try to suspend anyway using the
+                smallest image possible.  In particular, if "0" is written to
+                this file, the suspend image will be as small as possible.
+                Reading from this file will display the current image size
+                limit, which is set to 500 MB by default.
+What:           /sys/power/pm_trace
+Date:           August 2006
+Contact:        Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+                The /sys/power/pm_trace file controls the code which saves the
+                last PM event point in the RTC across reboots, so that you can
+                debug a machine that just hangs during suspend (or more
+                commonly, during resume).  Namely, the RTC is only used to save
+                the last PM event point if this file contains '1'.  Initially
+                it contains '0' which may be changed to '1' by writing a
+                string representing a nonzero integer into it.
+                To use this debugging feature you should attempt to suspend
+                the machine, then reboot it and run
+                dmesg -s 1000000 | grep 'hash matches'
+                CAUTION: Using it will cause your machine's real-time (CMOS)
+                clock to be set to a random invalid time after a resume.
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index 6d2412ec91ed..29c18966b050 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -532,6 +532,40 @@ appears outweighs the potential value of the hint that tells gcc to do
 something it would have done anyway.
+                Chapter 16: Function return values and names
+Functions can return values of many different kinds, and one of the
+most common is a value indicating whether the function succeeded or
+failed.  Such a value can be represented as an error-code integer
+(-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure,
+non-zero = success).
+Mixing up these two sorts of representations is a fertile source of
+difficult-to-find bugs.  If the C language included a strong distinction
+between integers and booleans then the compiler would find these mistakes
+for us... but it doesn't.  To help prevent such bugs, always follow this
+convention:
+        If the name of a function is an action or an imperative command,
+        the function should return an error-code integer.  If the name
+        is a predicate, the function should return a "succeeded" boolean.
+For example, "add work" is a command, and the add_work() function returns 0
+for success or -EBUSY for failure.  In the same way, "PCI device present" is
+a predicate, and the pci_dev_present() function returns 1 if it succeeds in
+finding a matching device or 0 if it doesn't.
+All EXPORTed functions must respect this convention, and so should all
+public functions.  Private (static) functions need not, but it is
+recommended that they do.
+Functions whose return value is the actual result of a computation, rather
+than an indication of whether the computation succeeded, are not subject to
+this rule.  Generally they indicate failure by returning some out-of-range
+result.  Typical examples would be functions that return pointers; they use
+NULL or the ERR_PTR mechanism to report failure.
                Appendix I: References
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index f8fe882e33dc..6d4b1ef5b6f1 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -181,27 +181,6 @@ X!Ilib/string.c
     </sect1>
  </chapter>
-  <chapter id="proc">
-     <title>The proc filesystem</title>
- 
-     <sect1><title>sysctl interface</title>
-!Ekernel/sysctl.c
-     </sect1>
-     <sect1><title>proc filesystem interface</title>
-!Ifs/proc/base.c
-     </sect1>
-  </chapter>
-  <chapter id="debugfs">
-     <title>The debugfs filesystem</title>
- 
-     <sect1><title>debugfs interface</title>
-!Efs/debugfs/inode.c
-!Efs/debugfs/file.c
-     </sect1>
-  </chapter>
  <chapter id="vfs">
     <title>The Linux VFS</title>
     <sect1><title>The Filesystem types</title>
@@ -234,6 +213,50 @@ X!Ilib/string.c
     </sect1>
  </chapter>
+  <chapter id="proc">
+     <title>The proc filesystem</title>
+ 
+     <sect1><title>sysctl interface</title>
+!Ekernel/sysctl.c
+     </sect1>
+     <sect1><title>proc filesystem interface</title>
+!Ifs/proc/base.c
+     </sect1>
+  </chapter>
+  <chapter id="sysfs">
+     <title>The Filesystem for Exporting Kernel Objects</title>
+!Efs/sysfs/file.c
+!Efs/sysfs/symlink.c
+!Efs/sysfs/bin.c
+  </chapter>
+  <chapter id="debugfs">
+     <title>The debugfs filesystem</title>
+ 
+     <sect1><title>debugfs interface</title>
+!Efs/debugfs/inode.c
+!Efs/debugfs/file.c
+     </sect1>
+  </chapter>
+  <chapter id="relayfs">
+     <title>relay interface support</title>
+     <para>
+        Relay interface support
+        is designed to provide an efficient mechanism for tools and
+        facilities to relay large amounts of data from kernel space to
+        user space.
+     </para>
+     <sect1><title>relay interface</title>
+!Ekernel/relay.c
+!Ikernel/relay.c
+     </sect1>
+  </chapter>
  <chapter id="netcore">
     <title>Linux Networking</title>
     <sect1><title>Networking Base Types</title>
@@ -349,13 +372,6 @@ X!Earch/i386/kernel/mca.c
     </sect1>
  </chapter>
-  <chapter id="sysfs">
-     <title>The Filesystem for Exporting Kernel Objects</title>
-!Efs/sysfs/file.c
-!Efs/sysfs/symlink.c
-!Efs/sysfs/bin.c
-  </chapter>
  <chapter id="security">
     <title>Security Framework</title>
 !Esecurity/security.c
@@ -386,6 +402,7 @@ X!Iinclude/linux/device.h
 -->
 !Edrivers/base/driver.c
 !Edrivers/base/core.c
+!Edrivers/base/class.c
 !Edrivers/base/firmware_class.c
 !Edrivers/base/transport_class.c
 !Edrivers/base/dmapool.c
@@ -437,6 +454,11 @@ X!Edrivers/pnp/system.c
 !Eblock/ll_rw_blk.c
  </chapter>
+  <chapter id="chrdev">
+        <title>Char devices</title>
+!Efs/char_dev.c
+  </chapter>
  <chapter id="miscdev">
     <title>Miscellaneous Devices</title>
 !Edrivers/char/misc.c
diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl
index 320af25de3a2..3608472d7b74 100644
--- a/Documentation/DocBook/usb.tmpl
+++ b/Documentation/DocBook/usb.tmpl
@@ -43,59 +43,52 @@
    <para>A Universal Serial Bus (USB) is used to connect a host,
    such as a PC or workstation, to a number of peripheral
-    devices.  USB uses a tree structure, with the host at the
+    devices.  USB uses a tree structure, with the host as the
    root (the system's master), hubs as interior nodes, and
-    peripheral devices as leaves (and slaves).
+    peripherals as leaves (and slaves).
    Modern PCs support several such trees of USB devices, usually
    one USB 2.0 tree (480 Mbit/sec each) with
    a few USB 1.1 trees (12 Mbit/sec each) that are used when you
    connect a USB 1.1 device directly to the machine's "root hub".
    </para>
-    <para>That master/slave asymmetry was designed in part for
+    <para>That master/slave asymmetry was designed-in for a number of
-    ease of use.  It is not physically possible to assemble
+    reasons, one being ease of use.  It is not physically possible to
-    (legal) USB cables incorrectly:  all upstream "to-the-host"
+    assemble (legal) USB cables incorrectly:  all upstream "to the host"
-    connectors are the rectangular type, matching the sockets on
+    connectors are the rectangular type (matching the sockets on
-    root hubs, and the downstream type are the squarish type
+    root hubs), and all downstream connectors are the squarish type
-    (or they are built in to the peripheral).
+    (or they are built into the peripheral).
-    Software doesn't need to deal with distributed autoconfiguration
+    Also, the host software doesn't need to deal with distributed
-    since the pre-designated master node manages all that.
+    auto-configuration since the pre-designated master node manages all that.
-    At the electrical level, bus protocol overhead is reduced by
+    And finally, at the electrical level, bus protocol overhead is reduced by
-    eliminating arbitration and moving scheduling into host software.
+    eliminating arbitration and moving scheduling into the host software.
    </para>
-    <para>USB 1.0 was announced in January 1996, and was revised
+    <para>USB 1.0 was announced in January 1996 and was revised
    as USB 1.1 (with improvements in hub specification and
    support for interrupt-out transfers) in September 1998.
-    USB 2.0 was released in April 2000, including high speed
+    USB 2.0 was released in April 2000, adding high-speed
-    transfers and transaction translating hubs (used for USB 1.1
+    transfers and transaction-translating hubs (used for USB 1.1
    and 1.0 backward compatibility).
    </para>
-    <para>USB support was added to Linux early in the 2.2 kernel series
+    <para>Kernel developers added USB support to Linux early in the 2.2 kernel
-    shortly before the 2.3 development forked off.  Updates
+    series, shortly before 2.3 development forked.  Updates from 2.3 were
-    from 2.3 were regularly folded back into 2.2 releases, bringing
+    regularly folded back into 2.2 releases, which improved reliability and
-    new features such as <filename>/sbin/hotplug</filename> support,
+    brought <filename>/sbin/hotplug</filename> support as well more drivers.
-    more drivers, and more robustness.
+    Such improvements were continued in the 2.5 kernel series, where they added
-    The 2.5 kernel series continued such improvements, and also
+    USB 2.0 support, improved performance, and made the host controller drivers
-    worked on USB 2.0 support,
+    (HCDs) more consistent.  They also simplified the API (to make bugs less
-    higher performance,
+    likely) and added internal "kerneldoc" documentation.
-    better consistency between host controller drivers,
-    API simplification (to make bugs less likely),
-    and providing internal "kerneldoc" documentation.
    </para>
    <para>Linux can run inside USB devices as well as on
    the hosts that control the devices.
-    Because the Linux 2.x USB support evolved to support mass market
+    But USB device drivers running inside those peripherals
-    platforms such as Apple Macintosh or PC-compatible systems,
-    it didn't address design concerns for those types of USB systems.
-    So it can't be used inside mass-market PDAs, or other peripherals.
-    USB device drivers running inside those Linux peripherals
    don't do the same things as the ones running inside hosts,
-    and so they've been given a different name:
+    so they've been given a different name:
-    they're called <emphasis>gadget drivers</emphasis>.
+    <emphasis>gadget drivers</emphasis>.
-    This document does not present gadget drivers.
+    This document does not cover gadget drivers.
    </para>
    </chapter>
@@ -103,17 +96,14 @@
 <chapter id="host">
    <title>USB Host-Side API Model</title>
-    <para>Within the kernel,
+    <para>Host-side drivers for USB devices talk to the "usbcore" APIs.
-    host-side drivers for USB devices talk to the "usbcore" APIs.
+    There are two.  One is intended for
-    There are two types of public "usbcore" APIs, targetted at two different
+    <emphasis>general-purpose</emphasis> drivers (exposed through
-    layers of USB driver.  Those are
+    driver frameworks), and the other is for drivers that are
-    <emphasis>general purpose</emphasis> drivers, exposed through
+    <emphasis>part of the core</emphasis>.
-    driver frameworks such as block, character, or network devices;
+    Such core drivers include the <emphasis>hub</emphasis> driver
-    and drivers that are <emphasis>part of the core</emphasis>,
+    (which manages trees of USB devices) and several different kinds
-    which are involved in managing a USB bus.
+    of <emphasis>host controller drivers</emphasis>,
-    Such core drivers include the <emphasis>hub</emphasis> driver,
-    which manages trees of USB devices, and several different kinds
-    of <emphasis>host controller driver (HCD)</emphasis>,
    which control individual busses.
    </para>
@@ -122,21 +112,21 @@
     
    <itemizedlist>
-        <listitem><para>USB supports four kinds of data transfer
+        <listitem><para>USB supports four kinds of data transfers
-        (control, bulk, interrupt, and isochronous).  Two transfer
+        (control, bulk, interrupt, and isochronous).  Two of them (control
-        types use bandwidth as it's available (control and bulk),
+        and bulk) use bandwidth as it's available,
-        while the other two types of transfer (interrupt and isochronous)
+        while the other two (interrupt and isochronous)
        are scheduled to provide guaranteed bandwidth.
        </para></listitem>
        <listitem><para>The device description model includes one or more
        "configurations" per device, only one of which is active at a time.
-        Devices that are capable of high speed operation must also support
+        Devices that are capable of high-speed operation must also support
-        full speed configurations, along with a way to ask about the
+        full-speed configurations, along with a way to ask about the
-        "other speed" configurations that might be used.
+        "other speed" configurations which might be used.
        </para></listitem>
-        <listitem><para>Configurations have one or more "interface", each
+        <listitem><para>Configurations have one or more "interfaces", each
        of which may have "alternate settings".  Interfaces may be
        standardized by USB "Class" specifications, or may be specific to
        a vendor or device.</para>
@@ -162,7 +152,7 @@
        </para></listitem>
        <listitem><para>The Linux USB API supports synchronous calls for
-        control and bulk messaging.
+        control and bulk messages.
        It also supports asynchnous calls for all kinds of data transfer,
        using request structures called "URBs" (USB Request Blocks).
        </para></listitem>
@@ -463,14 +453,25 @@
            file in your Linux kernel sources.
            </para>
-            <para>Otherwise the main use for this file from programs
+            <para>This file, in combination with the poll() system call, can
-            is to poll() it to get notifications of usb devices
+            also be used to detect when devices are added or removed:
-            as they're plugged or unplugged.
+<programlisting>int fd;
-            To see what changed, you'd need to read the file and
+struct pollfd pfd;
-            compare "before" and "after" contents, scan the filesystem,
-            or see its hotplug event.
+fd = open("/proc/bus/usb/devices", O_RDONLY);
+pfd = { fd, POLLIN, 0 };
+for (;;) {
+        /* The first time through, this call will return immediately. */
+        poll(&amp;pfd, 1, -1);
+        /* To see what's changed, compare the file's previous and current
+           contents or scan the filesystem.  (Scanning is more precise.) */
+}</programlisting>
+            Note that this behavior is intended to be used for informational
+            and debug purposes.  It would be more appropriate to use programs
+            such as udev or HAL to initialize a device or start a user-mode
+            helper program, for instance.
            </para>
        </sect1>
        <sect1>
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 915ae8c986c6..1d6560413cc5 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -358,7 +358,8 @@ Here is a list of some of the different kernel trees available:
  quilt trees:
    - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
        kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
+    - x86-64, partly i386, Andi Kleen <ak@suse.de>
+        ftp.firstfloor.org:/pub/ak/x86_64/quilt/
 Bug Reporting
 -------------
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index a10bfb6ecd9f..a6cb6ffd2933 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -61,3 +61,6 @@ kernel patches.
    Documentation/kernel-parameters.txt.
 18: All new module parameters are documented with MODULE_PARM_DESC()
+19: All new userspace interfaces are documented in Documentation/ABI/.
+    See Documentation/ABI/README for more information.
diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers
index 6bd30fdd0786..58bead05eabb 100644
--- a/Documentation/SubmittingDrivers
+++ b/Documentation/SubmittingDrivers
@@ -59,11 +59,11 @@ Copyright:	The copyright owner must agree to use of GPL.
                are the same person/entity. If not, the name of
                the person/entity authorizing use of GPL should be
                listed in case it's necessary to verify the will of
-                the copright owner.
+                the copyright owner.
 Interfaces:     If your driver uses existing interfaces and behaves like
                other drivers in the same class it will be much more likely
-                to be accepted than if it invents gratuitous new ones. 
+                to be accepted than if it invents gratuitous new ones.
                If you need to implement a common API over Linux and NT
                drivers do it in userspace.
@@ -88,7 +88,7 @@ Clarity:	It helps if anyone can see how to fix the driver. It helps
                it will go in the bitbucket.
 Control:        In general if there is active maintainance of a driver by
-                the author then patches will be redirected to them unless 
+                the author then patches will be redirected to them unless
                they are totally obvious and without need of checking.
                If you want to be the contact and update point for the
                driver it is a good idea to state this in the comments,
@@ -100,7 +100,7 @@ What Criteria Do Not Determine Acceptance
 Vendor:         Being the hardware vendor and maintaining the driver is
                often a good thing. If there is a stable working driver from
                other people already in the tree don't expect 'we are the
-                vendor' to get your driver chosen. Ideally work with the 
+                vendor' to get your driver chosen. Ideally work with the
                existing driver author to build a single perfect driver.
 Author:         It doesn't matter if a large Linux company wrote the driver,
@@ -116,17 +116,13 @@ Linux kernel master tree:
        ftp.??.kernel.org:/pub/linux/kernel/...
        ?? == your country code, such as "us", "uk", "fr", etc.
-Linux kernel mailing list:              
+Linux kernel mailing list:
        linux-kernel@vger.kernel.org
        [mail majordomo@vger.kernel.org to subscribe]
 Linux Device Drivers, Third Edition (covers 2.6.10):
        http://lwn.net/Kernel/LDD3/  (free version)
-Kernel traffic:
-        Weekly summary of kernel list activity (much easier to read)
-        http://www.kerneltraffic.org/kernel-traffic/
 LWN.net:
        Weekly summary of kernel development activity - http://lwn.net/
        2.6 API changes:
@@ -145,11 +141,8 @@ KernelNewbies:
 Linux USB project:
        http://www.linux-usb.org/
-How to NOT write kernel driver by arjanv@redhat.com
+How to NOT write kernel driver by Arjan van de Ven:
-        http://people.redhat.com/arjanv/olspaper.pdf
+        http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf
 Kernel Janitor:
        http://janitor.kernelnewbies.org/
--
-Last updated on 17 Nov 2005.
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index d42ab4c9e893..302d148c2e18 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -173,15 +173,15 @@ For small patches you may want to CC the Trivial Patch Monkey
 trivial@kernel.org managed by Adrian Bunk; which collects "trivial"
 patches. Trivial patches must qualify for one of the following rules:
 Spelling fixes in documentation
- Spelling fixes which could break grep(1).
+ Spelling fixes which could break grep(1)
 Warning fixes (cluttering with useless warnings is bad)
 Compilation fixes (only if they are actually correct)
 Runtime fixes (only if they actually fix things)
- Removing use of deprecated functions/macros (eg. check_region).
+ Removing use of deprecated functions/macros (eg. check_region)
 Contact detail and documentation fixes
 Non-portable code replaced by portable code (even in arch-specific,
 since people copy, as long as it's trivial)
- Any fix by the author/maintainer of the file. (ie. patch monkey
+ Any fix by the author/maintainer of the file (ie. patch monkey
 in re-transmission mode)
 URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/>
@@ -209,6 +209,19 @@ Exception:  If your mailer is mangling patches then someone may ask
 you to re-send them using MIME.
+WARNING: Some mailers like Mozilla send your messages with
+---- message header ----
+Content-Type: text/plain; charset=us-ascii; format=flowed
+---- message header ----
+The problem is that "format=flowed" makes some of the mailers
+on receiving side to replace TABs with spaces and do similar
+changes. Thus the patches from you can look corrupted.
+To fix this just make your mozilla defaults/pref/mailnews.js file to look like:
+pref("mailnews.send_plaintext_flowed", false); // RFC 2646=======
+pref("mailnews.display.disable_format_flowed_support", true);
 7) E-mail size.
@@ -245,13 +258,13 @@ updated change.
 It is quite common for Linus to "drop" your patch without comment.
 That's the nature of the system.  If he drops your patch, it could be
 due to
-* Your patch did not apply cleanly to the latest kernel version
+* Your patch did not apply cleanly to the latest kernel version.
 * Your patch was not sufficiently discussed on linux-kernel.
-* A style issue (see section 2),
+* A style issue (see section 2).
-* An e-mail formatting issue (re-read this section)
+* An e-mail formatting issue (re-read this section).
-* A technical problem with your change
+* A technical problem with your change.
-* He gets tons of e-mail, and yours got lost in the shuffle
+* He gets tons of e-mail, and yours got lost in the shuffle.
-* You are being annoying (See Figure 1)
+* You are being annoying.
 When in doubt, solicit comments on linux-kernel mailing list.
@@ -476,10 +489,10 @@ SECTION 3 - REFERENCES
 Andrew Morton, "The perfect patch" (tpp).
  <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>
-Jeff Garzik, "Linux kernel patch submission format."
+Jeff Garzik, "Linux kernel patch submission format".
  <http://linux.yyz.us/patch-format.html>
-Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer".
+Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer".
  <http://www.kroah.com/log/2005/03/31/>
  <http://www.kroah.com/log/2005/07/08/>
  <http://www.kroah.com/log/2005/10/19/>
@@ -488,9 +501,9 @@ Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer".
 NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
  <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2>
-Kernel Documentation/CodingStyle
+Kernel Documentation/CodingStyle:
  <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
-Linus Torvald's mail on the canonical patch format:
+Linus Torvalds's mail on the canonical patch format:
  <http://lkml.org/lkml/2005/4/7/183>
 --
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 76b44290c154..842f0d1ab216 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -217,11 +217,11 @@ exclusive cpuset.  Also, the use of a Linux virtual file system (vfs)
 to represent the cpuset hierarchy provides for a familiar permission
 and name space for cpusets, with a minimum of additional kernel code.
-The cpus file in the root (top_cpuset) cpuset is read-only.
+The cpus and mems files in the root (top_cpuset) cpuset are
-It automatically tracks the value of cpu_online_map, using a CPU
+read-only.  The cpus file automatically tracks the value of
-hotplug notifier.  If and when memory nodes can be hotplugged,
+cpu_online_map using a CPU hotplug notifier, and the mems file
-we expect to make the mems file in the root cpuset read-only
+automatically tracks the value of node_online_map using the
-as well, and have it track the value of node_online_map.
+cpuset_track_online_nodes() hook.
 1.4 What are exclusive cpusets ?
diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index 66c725f530f3..addc67b1d770 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -2543,6 +2543,9 @@ Your cooperation is appreciated.
                 64 = /dev/usb/rio500   Diamond Rio 500
                 65 = /dev/usb/usblcd   USBLCD Interface (info@usblcd.de)
                 66 = /dev/usb/cpad0    Synaptics cPad (mouse/LCD)
+                 67 = /dev/usb/adutux0  1st Ontrak ADU device
+                    ...
+                 76 = /dev/usb/adutux10 10th Ontrak ADU device
                 96 = /dev/usb/hiddev0  1st USB HID device
                    ...
                111 = /dev/usb/hiddev15 16th USB HID device
diff --git a/Documentation/fb/intelfb.txt b/Documentation/fb/intelfb.txt
index c12d39a23c3d..aa0d322db171 100644
--- a/Documentation/fb/intelfb.txt
+++ b/Documentation/fb/intelfb.txt
@@ -1,16 +1,19 @@
-Intel 830M/845G/852GM/855GM/865G/915G Framebuffer driver
+Intel 830M/845G/852GM/855GM/865G/915G/945G Framebuffer driver
 ================================================================
 A. Introduction
-        This is a framebuffer driver for various Intel 810/815 compatible
+        This is a framebuffer driver for various Intel 8xx/9xx compatible
 graphics devices.  These would include:
        Intel 830M
-        Intel 810E845G
+        Intel 845G
        Intel 852GM
        Intel 855GM
        Intel 865G
        Intel 915G
+        Intel 915GM
+        Intel 945G
+        Intel 945GM
 B.  List of available options
@@ -78,7 +81,7 @@ C. Kernel booting
 Separate each option/option-pair by commas (,) and the option from its value
 with an equals sign (=) as in the following:
-video=i810fb:option1,option2=value2
+video=intelfb:option1,option2=value2
 Sample Usage
 ------------
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 9b9915044d3c..9364f47c7116 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,6 +6,21 @@ be removed from this file.
 ---------------------------
+What:   /sys/devices/.../power/state
+        dev->power.power_state
+        dpm_runtime_{suspend,resume)()
+When:   July 2007
+Why:    Broken design for runtime control over driver power states, confusing
+        driver-internal runtime power management with:  mechanisms to support
+        system-wide sleep state transitions; event codes that distinguish
+        different phases of swsusp "sleep" transitions; and userspace policy
+        inputs.  This framework was never widely used, and most attempts to
+        use it were broken.  Drivers should instead be exposing domain-specific
+        interfaces either to kernel or to userspace.
+Who:    Pavel Machek <pavel@suse.cz>
+---------------------------
 What:   RAW driver (CONFIG_RAW_DRIVER)
 When:   December 2005
 Why:    declared obsolete since kernel 2.6.3
@@ -31,15 +46,6 @@ Who:	Jody McIntyre <scjody@modernduck.com>
 ---------------------------
-What:   sbp2: module parameter "force_inquiry_hack"
-When:   July 2006
-Why:    Superceded by parameter "workarounds". Both parameters are meant to be
-        used ad-hoc and for single devices only, i.e. not in modprobe.conf,
-        therefore the impact of this feature replacement should be low.
-Who:    Stefan Richter <stefanr@s5r6.in-berlin.de>
---------------------------
 What:   Video4Linux API 1 ioctls and video_decoder.h from Video devices.
 When:   December 2006
 Why:    V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6
@@ -55,6 +61,18 @@ Who:	Mauro Carvalho Chehab <mchehab@brturbo.com.br>
 ---------------------------
+What:   sys_sysctl
+When:   January 2007
+Why:    The same information is available through /proc/sys and that is the
+        interface user space prefers to use. And there do not appear to be
+        any existing user in user space of sys_sysctl.  The additional
+        maintenance overhead of keeping a set of binary names gets
+        in the way of doing a good job of maintaining this interface.
+Who:    Eric Biederman <ebiederm@xmission.com>
+---------------------------
 What:   PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
 When:   November 2005
 Files:  drivers/pcmcia/: pcmcia_ioctl.c
@@ -202,14 +220,6 @@ Who:	Nick Piggin <npiggin@suse.de>
 ---------------------------
-What:   Support for the MIPS EV96100 evaluation board
-When:   September 2006
-Why:    Does no longer build since at least November 15, 2003, apparently
-        no userbase left.
-Who:    Ralf Baechle <ralf@linux-mips.org>
---------------------------
 What:   Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board
 When:   September 2006
 Why:    Does no longer build since quite some time, and was never popular,
@@ -294,3 +304,24 @@ Why:	The frame diverter is included in most distribution kernels, but is
        It is not clear if anyone is still using it.
 Who:    Stephen Hemminger <shemminger@osdl.org>
+---------------------------
+What:   PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
+When:   Oktober 2008
+Why:    The stacking of class devices makes these values misleading and
+        inconsistent.
+        Class devices should not carry any of these properties, and bus
+        devices have SUBSYTEM and DRIVER as a replacement.
+Who:    Kay Sievers <kay.sievers@suse.de>
+---------------------------
+What:   i2c-isa
+When:   December 2006
+Why:    i2c-isa is a non-sense and doesn't fit in the device driver
+        model. Drivers relying on it are better implemented as platform
+        drivers.
+Who:    Jean Delvare <khali@linux-fr.org>
+---------------------------
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 99902ae6804e..7240ee7515de 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -39,6 +39,8 @@ Table of Contents
  2.9   Appletalk
  2.10  IPX
  2.11  /proc/sys/fs/mqueue - POSIX message queues filesystem
+  2.12  /proc/<pid>/oom_adj - Adjust the oom-killer score
+  2.13  /proc/<pid>/oom_score - Display current oom-killer score
 ------------------------------------------------------------------------------
 Preface
@@ -1124,11 +1126,15 @@ debugging information is displayed on console.
 NMI switch that most IA32 servers have fires unknown NMI up, for example.
 If a system hangs up, try pressing the NMI switch.
-[NOTE]
+nmi_watchdog
-   This function and oprofile share a NMI callback. Therefore this function
+------------
-   cannot be enabled when oprofile is activated.
-   And NMI watchdog will be disabled when the value in this file is set to
+Enables/Disables the NMI watchdog on x86 systems.  When the value is non-zero
-   non-zero.
+the NMI watchdog is enabled and will continuously test all online cpus to
+determine whether or not they are still functioning properly.
+Because the NMI watchdog shares registers with oprofile, by disabling the NMI
+watchdog, oprofile may have more registers to utilize.
 2.4 /proc/sys/vm - The virtual memory subsystem
@@ -1958,6 +1964,22 @@ a queue must be less or equal then msg_max.
 maximum  message size value (it is every  message queue's attribute set during
 its creation).
+2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
+------------------------------------------------------
+This file can be used to adjust the score used to select which processes
+should be killed in an  out-of-memory  situation.  Giving it a high score will
+increase the likelihood of this process being killed by the oom-killer.  Valid
+values are in the range -16 to +15, plus the special value -17, which disables
+oom-killing altogether for this process.
+2.13 /proc/<pid>/oom_score - Display current oom-killer score
+-------------------------------------------------------------
+------------------------------------------------------------------------------
+This file can be used to check the current score used by the oom-killer is for
+any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which
+process should be killed in an out-of-memory situation.
 ------------------------------------------------------------------------------
 Summary
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87
index 9555be1ed999..e783fd62e308 100644
--- a/Documentation/hwmon/it87
+++ b/Documentation/hwmon/it87
@@ -13,12 +13,25 @@ Supported chips:
                       from Super I/O config space (8 I/O ports)
    Datasheet: Publicly available at the ITE website
               http://www.ite.com.tw/
+  * IT8716F
+    Prefix: 'it8716'
+    Addresses scanned: from Super I/O config space (8 I/O ports)
+    Datasheet: Publicly available at the ITE website
+               http://www.ite.com.tw/product_info/file/pc/IT8716F_V0.3.ZIP
+  * IT8718F
+    Prefix: 'it8718'
+    Addresses scanned: from Super I/O config space (8 I/O ports)
+    Datasheet: Publicly available at the ITE website
+               http://www.ite.com.tw/product_info/file/pc/IT8718F_V0.2.zip
+               http://www.ite.com.tw/product_info/file/pc/IT8718F_V0%203_(for%20C%20version).zip
  * SiS950   [clone of IT8705F]
    Prefix: 'it87'
    Addresses scanned: from Super I/O config space (8 I/O ports)
    Datasheet: No longer be available
-Author: Christophe Gauthron <chrisg@0-in.com>
+Authors:
+    Christophe Gauthron <chrisg@0-in.com>
+    Jean Delvare <khali@linux-fr.org>
 Module Parameters
@@ -43,26 +56,46 @@ Module Parameters
 Description
 -----------
-This driver implements support for the IT8705F, IT8712F and SiS950 chips.
+This driver implements support for the IT8705F, IT8712F, IT8716F,
+IT8718F and SiS950 chips.
-This driver also supports IT8712F, which adds SMBus access, and a VID
-input, used to report the Vcore voltage of the Pentium processor.
-The IT8712F additionally features VID inputs.
 These chips are 'Super I/O chips', supporting floppy disks, infrared ports,
 joysticks and other miscellaneous stuff. For hardware monitoring, they
 include an 'environment controller' with 3 temperature sensors, 3 fan
 rotation speed sensors, 8 voltage sensors, and associated alarms.
+The IT8712F and IT8716F additionally feature VID inputs, used to report
+the Vcore voltage of the processor. The early IT8712F have 5 VID pins,
+the IT8716F and late IT8712F have 6. They are shared with other functions
+though, so the functionality may not be available on a given system.
+The driver dumbly assume it is there.
+The IT8718F also features VID inputs (up to 8 pins) but the value is
+stored in the Super-I/O configuration space. Due to technical limitations,
+this value can currently only be read once at initialization time, so
+the driver won't notice and report changes in the VID value. The two
+upper VID bits share their pins with voltage inputs (in5 and in6) so you
+can't have both on a given board.
+The IT8716F, IT8718F and later IT8712F revisions have support for
+2 additional fans. They are not yet supported by the driver.
+The IT8716F and IT8718F, and late IT8712F and IT8705F also have optional
+16-bit tachometer counters for fans 1 to 3. This is better (no more fan
+clock divider mess) but not compatible with the older chips and
+revisions. For now, the driver only uses the 16-bit mode on the
+IT8716F and IT8718F.
 Temperatures are measured in degrees Celsius. An alarm is triggered once
 when the Overtemperature Shutdown limit is crossed.
 Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
-triggered if the rotation speed has dropped below a programmable limit. Fan
+triggered if the rotation speed has dropped below a programmable limit. When
-readings can be divided by a programmable divider (1, 2, 4 or 8) to give the
+16-bit tachometer counters aren't used, fan readings can be divided by
-readings more range or accuracy. Not all RPM values can accurately be
+a programmable divider (1, 2, 4 or 8) to give the readings more range or
-represented, so some rounding is done. With a divider of 2, the lowest
+accuracy. With a divider of 2, the lowest representable value is around
-representable value is around 2600 RPM.
+2600 RPM. Not all RPM values can accurately be represented, so some rounding
+is done.
 Voltage sensors (also known as IN sensors) report their values in volts. An
 alarm is triggered if the voltage has crossed a programmable minimum or
@@ -71,9 +104,9 @@ zero'; this is important for negative voltage measurements. All voltage
 inputs can measure voltages between 0 and 4.08 volts, with a resolution of
 0.016 volt. The battery voltage in8 does not have limit registers.
-The VID lines (IT8712F only) encode the core voltage value: the voltage
+The VID lines (IT8712F/IT8716F/IT8718F) encode the core voltage value:
-level your processor should work with. This is hardcoded by the mainboard
+the voltage level your processor should work with. This is hardcoded by
-and/or processor itself. It is a value in volts.
+the mainboard and/or processor itself. It is a value in volts.
 If an alarm triggers, it will remain triggered until the hardware register
 is read at least once. This means that the cause for the alarm may already
diff --git a/Documentation/hwmon/k8temp b/Documentation/hwmon/k8temp
new file mode 100644
index 000000000000..bab445ab0f52
--- /dev/null
+++ b/Documentation/hwmon/k8temp
@@ -0,0 +1,52 @@
+Kernel driver k8temp
+====================
+Supported chips:
+  * AMD K8 CPU
+    Prefix: 'k8temp'
+    Addresses scanned: PCI space
+    Datasheet: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf
+Author: Rudolf Marek
+Contact: Rudolf Marek <r.marek@sh.cvut.cz>
+Description
+-----------
+This driver permits reading temperature sensor(s) embedded inside AMD K8 CPUs.
+Official documentation says that it works from revision F of K8 core, but
+in fact it seems to be implemented for all revisions of K8 except the first
+two revisions (SH-B0 and SH-B3).
+There can be up to four temperature sensors inside single CPU. The driver
+will auto-detect the sensors and will display only temperatures from
+implemented sensors.
+Mapping of /sys files is as follows:
+temp1_input - temperature of Core 0 and "place" 0
+temp2_input - temperature of Core 0 and "place" 1
+temp3_input - temperature of Core 1 and "place" 0
+temp4_input - temperature of Core 1 and "place" 1
+Temperatures are measured in degrees Celsius and measurement resolution is
+1 degree C. It is expected that future CPU will have better resolution. The
+temperature is updated once a second. Valid temperatures are from -49 to
+206 degrees C.
+Temperature known as TCaseMax was specified for processors up to revision E.
+This temperature is defined as temperature between heat-spreader and CPU
+case, so the internal CPU temperature supplied by this driver can be higher.
+There is no easy way how to measure the temperature which will correlate
+with TCaseMax temperature.
+For newer revisions of CPU (rev F, socket AM2) there is a mathematically
+computed temperature called TControl, which must be lower than TControlMax.
+The relationship is following:
+temp1_input - TjOffset*2 < TControlMax,
+TjOffset is not yet exported by the driver, TControlMax is usually
+70 degrees C. The rule of the thumb -> CPU temperature should not cross
+60 degrees C too much.
diff --git a/Documentation/hwmon/vt1211 b/Documentation/hwmon/vt1211
new file mode 100644
index 000000000000..77fa633b97a8
--- /dev/null
+++ b/Documentation/hwmon/vt1211
@@ -0,0 +1,206 @@
+Kernel driver vt1211
+====================
+Supported chips:
+  * VIA VT1211
+    Prefix: 'vt1211'
+    Addresses scanned: none, address read from Super-I/O config space
+    Datasheet: Provided by VIA upon request and under NDA
+Authors: Juerg Haefliger <juergh@gmail.com>
+This driver is based on the driver for kernel 2.4 by Mark D. Studebaker and
+its port to kernel 2.6 by Lars Ekman.
+Thanks to Joseph Chan and Fiona Gatt from VIA for providing documentation and
+technical support.
+Module Parameters
+-----------------
+* uch_config: int       Override the BIOS default universal channel (UCH)
+                        configuration for channels 1-5.
+                        Legal values are in the range of 0-31. Bit 0 maps to
+                        UCH1, bit 1 maps to UCH2 and so on. Setting a bit to 1
+                        enables the thermal input of that particular UCH and
+                        setting a bit to 0 enables the voltage input.
+* int_mode: int         Override the BIOS default temperature interrupt mode.
+                        The only possible value is 0 which forces interrupt
+                        mode 0. In this mode, any pending interrupt is cleared
+                        when the status register is read but is regenerated as
+                        long as the temperature stays above the hysteresis
+                        limit.
+Be aware that overriding BIOS defaults might cause some unwanted side effects!
+Description
+-----------
+The VIA VT1211 Super-I/O chip includes complete hardware monitoring
+capabilities. It monitors 2 dedicated temperature sensor inputs (temp1 and
+temp2), 1 dedicated voltage (in5) and 2 fans. Additionally, the chip
+implements 5 universal input channels (UCH1-5) that can be individually
+programmed to either monitor a voltage or a temperature.
+This chip also provides manual and automatic control of fan speeds (according
+to the datasheet). The driver only supports automatic control since the manual
+mode doesn't seem to work as advertised in the datasheet. In fact I couldn't
+get manual mode to work at all! Be aware that automatic mode hasn't been
+tested very well (due to the fact that my EPIA M10000 doesn't have the fans
+connected to the PWM outputs of the VT1211 :-().
+The following table shows the relationship between the vt1211 inputs and the
+sysfs nodes.
+Sensor          Voltage Mode   Temp Mode   Default Use (from the datasheet)
+------          ------------   ---------   --------------------------------
+Reading 1                      temp1       Intel thermal diode
+Reading 3                      temp2       Internal thermal diode
+UCH1/Reading2   in0            temp3       NTC type thermistor
+UCH2            in1            temp4       +2.5V
+UCH3            in2            temp5       VccP (processor core)
+UCH4            in3            temp6       +5V
+UCH5            in4            temp7       +12V
+3.3V           in5                        Internal VCC (+3.3V)
+Voltage Monitoring
+------------------
+Voltages are sampled by an 8-bit ADC with a LSB of ~10mV. The supported input
+range is thus from 0 to 2.60V. Voltage values outside of this range need
+external scaling resistors. This external scaling needs to be compensated for
+via compute lines in sensors.conf, like:
+compute inx @*(1+R1/R2), @/(1+R1/R2)
+The board level scaling resistors according to VIA's recommendation are as
+follows. And this is of course totally dependent on the actual board
+implementation :-) You will have to find documentation for your own
+motherboard and edit sensors.conf accordingly.
+                                      Expected
+Voltage       R1     R2     Divider   Raw Value
+-----------------------------------------------
+2.5V         2K     10K    1.2       2083 mV
+VccP          ---    ---    1.0       1400 mV (1)
+5V           14K    10K    2.4       2083 mV
+12V          47K    10K    5.7       2105 mV
+3.3V (int)   2K     3.4K   1.588     3300 mV (2)
+3.3V (ext)   6.8K   10K    1.68      1964 mV
+(1) Depending on the CPU (1.4V is for a VIA C3 Nehemiah).
+(2) R1 and R2 for 3.3V (int) are internal to the VT1211 chip and the driver
+    performs the scaling and returns the properly scaled voltage value.
+Each measured voltage has an associated low and high limit which triggers an
+alarm when crossed.
+Temperature Monitoring
+----------------------
+Temperatures are reported in millidegree Celsius. Each measured temperature
+has a high limit which triggers an alarm if crossed. There is an associated
+hysteresis value with each temperature below which the temperature has to drop
+before the alarm is cleared (this is only true for interrupt mode 0). The
+interrupt mode can be forced to 0 in case the BIOS doesn't do it
+automatically. See the 'Module Parameters' section for details.
+All temperature channels except temp2 are external. Temp2 is the VT1211
+internal thermal diode and the driver does all the scaling for temp2 and
+returns the temperature in millidegree Celsius. For the external channels
+temp1 and temp3-temp7, scaling depends on the board implementation and needs
+to be performed in userspace via sensors.conf.
+Temp1 is an Intel-type thermal diode which requires the following formula to
+convert between sysfs readings and real temperatures:
+compute temp1 (@-Offset)/Gain, (@*Gain)+Offset
+According to the VIA VT1211 BIOS porting guide, the following gain and offset
+values should be used:
+Diode Type      Offset   Gain
+----------      ------   ----
+Intel CPU       88.638   0.9528
+                65.000   0.9686   *)
+VIA C3 Ezra     83.869   0.9528
+VIA C3 Ezra-T   73.869   0.9528
+*) This is the formula from the lm_sensors 2.10.0 sensors.conf file. I don't
+know where it comes from or how it was derived, it's just listed here for
+completeness.
+Temp3-temp7 support NTC thermistors. For these channels, the driver returns
+the voltages as seen at the individual pins of UCH1-UCH5. The voltage at the
+pin (Vpin) is formed by a voltage divider made of the thermistor (Rth) and a
+scaling resistor (Rs):
+Vpin = 2200 * Rth / (Rs + Rth)   (2200 is the ADC max limit of 2200 mV)
+The equation for the thermistor is as follows (google it if you want to know
+more about it):
+Rth = Ro * exp(B * (1 / T - 1 / To))   (To is 298.15K (25C) and Ro is the
+                                        nominal resistance at 25C)
+Mingling the above two equations and assuming Rs = Ro and B = 3435 yields the
+following formula for sensors.conf:
+compute tempx 1 / (1 / 298.15 - (` (2200 / @ - 1)) / 3435) - 273.15,
+              2200 / (1 + (^ (3435 / 298.15 - 3435 / (273.15 + @))))
+Fan Speed Control
+-----------------
+The VT1211 provides 2 programmable PWM outputs to control the speeds of 2
+fans. Writing a 2 to any of the two pwm[1-2]_enable sysfs nodes will put the
+PWM controller in automatic mode. There is only a single controller that
+controls both PWM outputs but each PWM output can be individually enabled and
+disabled.
+Each PWM has 4 associated distinct output duty-cycles: full, high, low and
+off. Full and off are internally hard-wired to 255 (100%) and 0 (0%),
+respectively. High and low can be programmed via
+pwm[1-2]_auto_point[2-3]_pwm. Each PWM output can be associated with a
+different thermal input but - and here's the weird part - only one set of
+thermal thresholds exist that controls both PWMs output duty-cycles. The
+thermal thresholds are accessible via pwm[1-2]_auto_point[1-4]_temp. Note
+that even though there are 2 sets of 4 auto points each, they map to the same
+registers in the VT1211 and programming one set is sufficient (actually only
+the first set pwm1_auto_point[1-4]_temp is writable, the second set is
+read-only).
+PWM Auto Point             PWM Output Duty-Cycle
+------------------------------------------------
+pwm[1-2]_auto_point4_pwm   full speed duty-cycle (hard-wired to 255)
+pwm[1-2]_auto_point3_pwm   high speed duty-cycle
+pwm[1-2]_auto_point2_pwm   low speed duty-cycle
+pwm[1-2]_auto_point1_pwm   off duty-cycle (hard-wired to 0)
+Temp Auto Point             Thermal Threshold
+---------------------------------------------
+pwm[1-2]_auto_point4_temp   full speed temp
+pwm[1-2]_auto_point3_temp   high speed temp
+pwm[1-2]_auto_point2_temp   low speed temp
+pwm[1-2]_auto_point1_temp   off temp
+Long story short, the controller implements the following algorithm to set the
+PWM output duty-cycle based on the input temperature:
+Thermal Threshold             Output Duty-Cycle
+                    (Rising Temp)           (Falling Temp)
+----------------------------------------------------------
+                    full speed duty-cycle   full speed duty-cycle
+full speed temp
+                    high speed duty-cycle   full speed duty-cycle
+high speed temp
+                    low speed duty-cycle    high speed duty-cycle
+low speed temp
+                    off duty-cycle          low speed duty-cycle
+off temp
diff --git a/Documentation/hwmon/w83627ehf b/Documentation/hwmon/w83627ehf
new file mode 100644
index 000000000000..fae3b781d82d
--- /dev/null
+++ b/Documentation/hwmon/w83627ehf
@@ -0,0 +1,85 @@
+Kernel driver w83627ehf
+=======================
+Supported chips:
+  * Winbond W83627EHF/EHG (ISA access ONLY)
+    Prefix: 'w83627ehf'
+    Addresses scanned: ISA address retrieved from Super I/O registers
+    Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83627EHF_%20W83627EHGb.pdf
+Authors:
+        Jean Delvare <khali@linux-fr.org>
+        Yuan Mu (Winbond)
+        Rudolf Marek <r.marek@sh.cvut.cz>
+Description
+-----------
+This driver implements support for the Winbond W83627EHF and W83627EHG
+super I/O chips. We will refer to them collectively as Winbond chips.
+The chips implement three temperature sensors, five fan rotation
+speed sensors, ten analog voltage sensors, alarms with beep warnings (control
+unimplemented), and some automatic fan regulation strategies (plus manual
+fan control mode).
+Temperatures are measured in degrees Celsius and measurement resolution is 1
+degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when
+the temperature gets higher than high limit; it stays on until the temperature
+falls below the Hysteresis value.
+Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
+triggered if the rotation speed has dropped below a programmable limit. Fan
+readings can be divided by a programmable divider (1, 2, 4, 8, 16, 32, 64 or
+128) to give the readings more range or accuracy. The driver sets the most
+suitable fan divisor itself. Some fans might not be present because they
+share pins with other functions.
+Voltage sensors (also known as IN sensors) report their values in millivolts.
+An alarm is triggered if the voltage has crossed a programmable minimum
+or maximum limit.
+The driver supports automatic fan control mode known as Thermal Cruise.
+In this mode, the chip attempts to keep the measured temperature in a
+predefined temperature range. If the temperature goes out of range, fan
+is driven slower/faster to reach the predefined range again.
+The mode works for fan1-fan4. Mapping of temperatures to pwm outputs is as
+follows:
+temp1 -> pwm1
+temp2 -> pwm2
+temp3 -> pwm3
+prog  -> pwm4 (the programmable setting is not supported by the driver)
+/sys files
+----------
+pwm[1-4] - this file stores PWM duty cycle or DC value (fan speed) in range:
+           0 (stop) to 255 (full)
+pwm[1-4]_enable - this file controls mode of fan/temperature control:
+        * 1 Manual Mode, write to pwm file any value 0-255 (full speed)
+        * 2 Thermal Cruise
+Thermal Cruise mode
+-------------------
+If the temperature is in the range defined by:
+pwm[1-4]_target    - set target temperature, unit millidegree Celcius
+                     (range 0 - 127000)
+pwm[1-4]_tolerance - tolerance, unit millidegree Celcius (range 0 - 15000)
+there are no changes to fan speed. Once the temperature leaves the interval,
+fan speed increases (temp is higher) or decreases if lower than desired.
+There are defined steps and times, but not exported by the driver yet.
+pwm[1-4]_min_output - minimum fan speed (range 1 - 255), when the temperature
+                      is below defined range.
+pwm[1-4]_stop_time  - how many milliseconds [ms] must elapse to switch
+                      corresponding fan off. (when the temperature was below
+                      defined range).
+Note: last two functions are influenced by other control bits, not yet exported
+      by the driver, so a change might not have any effect.
diff --git a/Documentation/hwmon/w83791d b/Documentation/hwmon/w83791d
index 83a3836289c2..19b2ed739fa1 100644
--- a/Documentation/hwmon/w83791d
+++ b/Documentation/hwmon/w83791d
@@ -5,7 +5,7 @@ Supported chips:
  * Winbond W83791D
    Prefix: 'w83791d'
    Addresses scanned: I2C 0x2c - 0x2f
-    Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791Da.pdf
+    Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791D_W83791Gb.pdf
 Author: Charles Spirakis <bezaur@gmail.com>
@@ -20,6 +20,9 @@ Credits:
    Chunhao Huang <DZShen@Winbond.com.tw>,
    Rudolf Marek <r.marek@sh.cvut.cz>
+Additional contributors:
+    Sven Anders <anders@anduras.de>
 Module Parameters
 -----------------
@@ -46,7 +49,8 @@ Module Parameters
 Description
 -----------
-This driver implements support for the Winbond W83791D chip.
+This driver implements support for the Winbond W83791D chip. The W83791G
+chip appears to be the same as the W83791D but is lead free.
 Detection of the chip can sometimes be foiled because it can be in an
 internal state that allows no clean access (Bank with ID register is not
@@ -71,34 +75,36 @@ Voltage sensors (also known as IN sensors) report their values in millivolts.
 An alarm is triggered if the voltage has crossed a programmable minimum
 or maximum limit.
-Alarms are provided as output from a "realtime status register". The
+The bit ordering for the alarm "realtime status register" and the
-following bits are defined:
+"beep enable registers" are different.
-bit - alarm on:
+in0 (VCORE)  :  alarms: 0x000001 beep_enable: 0x000001
-0  - Vcore
+in1 (VINR0)  :  alarms: 0x000002 beep_enable: 0x002000 <== mismatch
-1  - VINR0
+in2 (+3.3VIN):  alarms: 0x000004 beep_enable: 0x000004
-2  - +3.3VIN
+in3 (5VDD)   :  alarms: 0x000008 beep_enable: 0x000008
-3  - 5VDD
+in4 (+12VIN) :  alarms: 0x000100 beep_enable: 0x000100
-4  - temp1
+in5 (-12VIN) :  alarms: 0x000200 beep_enable: 0x000200
-5  - temp2
+in6 (-5VIN)  :  alarms: 0x000400 beep_enable: 0x000400
-6  - fan1
+in7 (VSB)    :  alarms: 0x080000 beep_enable: 0x010000 <== mismatch
-7  - fan2
+in8 (VBAT)   :  alarms: 0x100000 beep_enable: 0x020000 <== mismatch
-8  - +12VIN
+in9 (VINR1)  :  alarms: 0x004000 beep_enable: 0x004000
-9  - -12VIN
+temp1        :  alarms: 0x000010 beep_enable: 0x000010
-10 - -5VIN
+temp2        :  alarms: 0x000020 beep_enable: 0x000020
-11 - fan3
+temp3        :  alarms: 0x002000 beep_enable: 0x000002 <== mismatch
-12 - chassis
+fan1         :  alarms: 0x000040 beep_enable: 0x000040
-13 - temp3
+fan2         :  alarms: 0x000080 beep_enable: 0x000080
-14 - VINR1
+fan3         :  alarms: 0x000800 beep_enable: 0x000800
-15 - reserved
+fan4         :  alarms: 0x200000 beep_enable: 0x200000
-16 - tart1
+fan5         :  alarms: 0x400000 beep_enable: 0x400000
-17 - tart2
+tart1        :  alarms: 0x010000 beep_enable: 0x040000 <== mismatch
-18 - tart3
+tart2        :  alarms: 0x020000 beep_enable: 0x080000 <== mismatch
-19 - VSB
+tart3        :  alarms: 0x040000 beep_enable: 0x100000 <== mismatch
-20 - VBAT
+case_open    :  alarms: 0x001000 beep_enable: 0x001000
-21 - fan4
+user_enable  :  alarms: -------- beep_enable: 0x800000
-22 - fan5
-23 - reserved
+*** NOTE: It is the responsibility of user-space code to handle the fact
+that the beep enable and alarm bits are in different positions when using that
+feature of the chip.
 When an alarm goes off, you can be warned by a beeping signal through your
 computer speaker. It is possible to enable all beeping globally, or only
@@ -109,5 +115,6 @@ often will do no harm, but will return 'old' values.
 W83791D TODO:
 ---------------
-Provide a patch for per-file alarms as discussed on the mailing list
+Provide a patch for per-file alarms and beep enables as defined in the hwmon
+        documentation (Documentation/hwmon/sysfs-interface)
 Provide a patch for smart-fan control (still need appropriate motherboard/fans)
diff --git a/Documentation/i2c/busses/i2c-viapro b/Documentation/i2c/busses/i2c-viapro
index 16775663b9f5..25680346e0ac 100644
--- a/Documentation/i2c/busses/i2c-viapro
+++ b/Documentation/i2c/busses/i2c-viapro
@@ -7,9 +7,12 @@ Supported adapters:
  * VIA Technologies, Inc. VT82C686A/B
    Datasheet: Sometimes available at the VIA website
-  * VIA Technologies, Inc. VT8231, VT8233, VT8233A, VT8235, VT8237R
+  * VIA Technologies, Inc. VT8231, VT8233, VT8233A
    Datasheet: available on request from VIA
+  * VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251
+    Datasheet: available on request and under NDA from VIA
 Authors:
        Ky�sti M�lkki <kmalkki@cc.hut.fi>,
        Mark D. Studebaker <mdsxyz123@yahoo.com>,
@@ -39,6 +42,8 @@ Your lspci -n listing must show one of these :
 device 1106:8235   (VT8231 function 4)
 device 1106:3177   (VT8235)
 device 1106:3227   (VT8237R)
+ device 1106:3337   (VT8237A)
+ device 1106:3287   (VT8251)
 If none of these show up, you should look in the BIOS for settings like
 enable ACPI / SMBus or even USB.
diff --git a/Documentation/i2c/i2c-stub b/Documentation/i2c/i2c-stub
index d6dcb138abf5..9cc081e69764 100644
--- a/Documentation/i2c/i2c-stub
+++ b/Documentation/i2c/i2c-stub
@@ -6,9 +6,12 @@ This module is a very simple fake I2C/SMBus driver.  It implements four
 types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and
 (r/w) word data.
+You need to provide a chip address as a module parameter when loading
+this driver, which will then only react to SMBus commands to this address.
 No hardware is needed nor associated with this module.  It will accept write
-quick commands to all addresses; it will respond to the other commands (also
+quick commands to one address; it will respond to the other commands (also
-to all addresses) by reading from or writing to an array in memory.  It will
+to one address) by reading from or writing to an array in memory.  It will
 also spam the kernel logs for every command it handles.
 A pointer register with auto-increment is implemented for all byte
@@ -21,6 +24,11 @@ The typical use-case is like this:
        3. load the target sensors chip driver module
        4. observe its behavior in the kernel log
+PARAMETERS:
+int chip_addr:
+        The SMBus address to emulate a chip at.
 CAVEATS:
 There are independent arrays for byte/data and word/data commands.  Depending
@@ -33,6 +41,9 @@ If the hardware for your driver has banked registers (e.g. Winbond sensors
 chips) this module will not work well - although it could be extended to
 support that pretty easily.
+Only one chip address is supported - although this module could be
+extended to support more.
 If you spam it hard enough, printk can be lossy.  This module really wants
 something like relayfs.
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index b7d6abb501a6..e2cbd59cf2d0 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -421,6 +421,11 @@ more details, with real examples.
        The second argument is optional, and if supplied will be used
        if first argument is not supported.
+    as-instr
+        as-instr checks if the assembler reports a specific instruction
+        and then outputs either option1 or option2
+        C escapes are supported in the test instruction
    cc-option
        cc-option is used to check if $(CC) supports a given option, and not
        supported to use an optional second option.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 71d05f481727..137e993f4329 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -110,6 +110,13 @@ be entered as an environment variable, whereas its absence indicates that
 it will appear as a kernel argument readable via /proc/cmdline by programs
 running once the system is up.
+The number of kernel parameters is not limited, but the length of the
+complete command line (parameters including spaces etc.) is limited to
+a fixed number of characters. This limit depends on the architecture
+and is between 256 and 4096 characters. It is defined in the file
+./include/asm/setup.h as COMMAND_LINE_SIZE.
        53c7xx=         [HW,SCSI] Amiga SCSI controllers
                        See header of drivers/scsi/53c7xx.c.
                        See also Documentation/scsi/ncr53c7xx.txt.
@@ -573,8 +580,6 @@ running once the system is up.
        gscd=           [HW,CD]
                        Format: <io>
-        gt96100eth=     [NET] MIPS GT96100 Advanced Communication Controller
        gus=            [HW,OSS]
                        Format: <io>,<irq>,<dma>,<dma16>
@@ -1240,7 +1245,11 @@ running once the system is up.
                                bootloader. This is currently used on
                                IXP2000 systems where the bus has to be
                                configured a certain way for adjunct CPUs.
+                noearly         [X86] Don't do any early type 1 scanning.
+                                This might help on some broken boards which
+                                machine check when some devices' config space
+                                is read. But various workarounds are disabled
+                                and some IOMMU drivers will not work.
        pcmv=           [HW,PCMCIA] BadgePAD 4
        pd.             [PARIDE]
@@ -1322,7 +1331,7 @@ running once the system is up.
        pt.             [PARIDE]
                        See Documentation/paride.txt.
-        quiet=          [KNL] Disable log messages
+        quiet           [KNL] Disable most log messages
        r128=           [HW,DRM]
@@ -1363,6 +1372,14 @@ running once the system is up.
        reserve=        [KNL,BUGS] Force the kernel to ignore some iomem area
+        reservetop=     [IA-32]
+                        Format: nn[KMG]
+                        Reserves a hole at the top of the kernel virtual
+                        address space.
+        reset_devices   [KNL] Force drivers to reset the underlying device
+                        during initialization.
        resume=         [SWSUSP]
                        Specify the partition device for software suspend
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index afac780445cd..dc942eaf490f 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -192,6 +192,17 @@ or, for backwards compatibility, the option value.  E.g.,
 arp_interval
        Specifies the ARP link monitoring frequency in milliseconds.
+        The ARP monitor works by periodically checking the slave
+        devices to determine whether they have sent or received
+        traffic recently (the precise criteria depends upon the
+        bonding mode, and the state of the slave).  Regular traffic is
+        generated via ARP probes issued for the addresses specified by
+        the arp_ip_target option.
+        This behavior can be modified by the arp_validate option,
+        below.
        If ARP monitoring is used in an etherchannel compatible mode
        (modes 0 and 2), the switch should be configured in a mode
        that evenly distributes packets across all links. If the
@@ -213,6 +224,54 @@ arp_ip_target
        maximum number of targets that can be specified is 16.  The
        default value is no IP addresses.
+arp_validate
+        Specifies whether or not ARP probes and replies should be
+        validated in the active-backup mode.  This causes the ARP
+        monitor to examine the incoming ARP requests and replies, and
+        only consider a slave to be up if it is receiving the
+        appropriate ARP traffic.
+        Possible values are:
+        none or 0
+                No validation is performed.  This is the default.
+        active or 1
+                Validation is performed only for the active slave.
+        backup or 2
+                Validation is performed only for backup slaves.
+        all or 3
+                Validation is performed for all slaves.
+        For the active slave, the validation checks ARP replies to
+        confirm that they were generated by an arp_ip_target.  Since
+        backup slaves do not typically receive these replies, the
+        validation performed for backup slaves is on the ARP request
+        sent out via the active slave.  It is possible that some
+        switch or network configurations may result in situations
+        wherein the backup slaves do not receive the ARP requests; in
+        such a situation, validation of backup slaves must be
+        disabled.
+        This option is useful in network configurations in which
+        multiple bonding hosts are concurrently issuing ARPs to one or
+        more targets beyond a common switch.  Should the link between
+        the switch and target fail (but not the switch itself), the
+        probe traffic generated by the multiple bonding instances will
+        fool the standard ARP monitor into considering the links as
+        still up.  Use of the arp_validate option can resolve this, as
+        the ARP monitor will only consider ARP requests and replies
+        associated with its own instance of bonding.
+        This option was added in bonding version 3.1.0.
 downdelay
        Specifies the time, in milliseconds, to wait before disabling
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index 44f2f769e865..18d385c068fc 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -100,6 +100,7 @@ Examples:
                         are: IPSRC_RND #IP Source is random (between min/max),
                              IPDST_RND, UDPSRC_RND,
                              UDPDST_RND, MACSRC_RND, MACDST_RND 
+                              MPLS_RND, VID_RND, SVID_RND
 pgset "udp_src_min 9"   set UDP source port min, If < udp_src_max, then
                         cycle through the port range.
@@ -125,6 +126,21 @@ Examples:
 pgset "mpls 0"           turn off mpls (or any invalid argument works too!)
+ pgset "vlan_id 77"       set VLAN ID 0-4095
+ pgset "vlan_p 3"         set priority bit 0-7 (default 0)
+ pgset "vlan_cfi 0"       set canonical format identifier 0-1 (default 0)
+ pgset "svlan_id 22"      set SVLAN ID 0-4095
+ pgset "svlan_p 3"        set priority bit 0-7 (default 0)
+ pgset "svlan_cfi 0"      set canonical format identifier 0-1 (default 0)
+ pgset "vlan_id 9999"     > 4095 remove vlan and svlan tags
+ pgset "svlan 9999"       > 4095 remove svlan tag
+ pgset "tos XX"           set former IPv4 TOS field (e.g. "tos 28" for AF11 no ECN, default 00)
+ pgset "traffic_class XX" set former IPv6 TRAFFIC CLASS (e.g. "traffic_class B8" for EF no ECN, default 00)
 pgset stop               aborts injection. Also, ^C aborts generator.
diff --git a/Documentation/nommu-mmap.txt b/Documentation/nommu-mmap.txt
index b88ebe4d808c..7714f57caad5 100644
--- a/Documentation/nommu-mmap.txt
+++ b/Documentation/nommu-mmap.txt
@@ -116,6 +116,9 @@ FURTHER NOTES ON NO-MMU MMAP
 (*) A list of all the mappings on the system is visible through /proc/maps in
     no-MMU mode.
+ (*) A list of all the mappings in use by a process is visible through
+     /proc/<pid>/maps in no-MMU mode.
 (*) Supplying MAP_FIXED or a requesting a particular mapping address will
     result in an error.
@@ -125,6 +128,49 @@ FURTHER NOTES ON NO-MMU MMAP
     error will result if they don't. This is most likely to be encountered
     with character device files, pipes, fifos and sockets.
+==========================
+INTERPROCESS SHARED MEMORY
+==========================
+Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
+mode.  The former through the usual mechanism, the latter through files created
+on ramfs or tmpfs mounts.
+=======
+FUTEXES
+=======
+Futexes are supported in NOMMU mode if the arch supports them.  An error will
+be given if an address passed to the futex system call lies outside the
+mappings made by a process or if the mapping in which the address lies does not
+support futexes (such as an I/O chardev mapping).
+=============
+NO-MMU MREMAP
+=============
+The mremap() function is partially supported.  It may change the size of a
+mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size
+of the mapping exceeds the size of the slab object currently occupied by the
+memory to which the mapping refers, or if a smaller slab object could be used.
+MREMAP_FIXED is not supported, though it is ignored if there's no change of
+address and the object does not need to be moved.
+Shared mappings may not be moved.  Shareable mappings may not be moved either,
+even if they are not currently shared.
+The mremap() function must be given an exact match for base address and size of
+a previously mapped object.  It may not be used to create holes in existing
+mappings, move parts of existing mappings or resize parts of mappings.  It must
+act on a complete mapping.
+[*] Not currently supported.
 ============================================
 PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT
 ============================================
diff --git a/Documentation/pcieaer-howto.txt b/Documentation/pcieaer-howto.txt
new file mode 100644
index 000000000000..16c251230c82
--- /dev/null
+++ b/Documentation/pcieaer-howto.txt
@@ -0,0 +1,253 @@
+   The PCI Express Advanced Error Reporting Driver Guide HOWTO
+                T. Long Nguyen  <tom.l.nguyen@intel.com>
+                Yanmin Zhang    <yanmin.zhang@intel.com>
+                                07/29/2006
+1. Overview
+1.1 About this guide
+This guide describes the basics of the PCI Express Advanced Error
+Reporting (AER) driver and provides information on how to use it, as
+well as how to enable the drivers of endpoint devices to conform with
+PCI Express AER driver.
+1.2 Copyright � Intel Corporation 2006.
+1.3 What is the PCI Express AER Driver?
+PCI Express error signaling can occur on the PCI Express link itself
+or on behalf of transactions initiated on the link. PCI Express
+defines two error reporting paradigms: the baseline capability and
+the Advanced Error Reporting capability. The baseline capability is
+required of all PCI Express components providing a minimum defined
+set of error reporting requirements. Advanced Error Reporting
+capability is implemented with a PCI Express advanced error reporting
+extended capability structure providing more robust error reporting.
+The PCI Express AER driver provides the infrastructure to support PCI
+Express Advanced Error Reporting capability. The PCI Express AER
+driver provides three basic functions:
+-       Gathers the comprehensive error information if errors occurred.
+-       Reports error to the users.
+-       Performs error recovery actions.
+AER driver only attaches root ports which support PCI-Express AER
+capability.
+2. User Guide
+2.1 Include the PCI Express AER Root Driver into the Linux Kernel
+The PCI Express AER Root driver is a Root Port service driver attached
+to the PCI Express Port Bus driver. If a user wants to use it, the driver
+has to be compiled. Option CONFIG_PCIEAER supports this capability. It
+depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
+CONFIG_PCIEAER = y.
+2.2 Load PCI Express AER Root Driver
+There is a case where a system has AER support in BIOS. Enabling the AER
+Root driver and having AER support in BIOS may result unpredictable
+behavior. To avoid this conflict, a successful load of the AER Root driver
+requires ACPI _OSC support in the BIOS to allow the AER Root driver to
+request for native control of AER. See the PCI FW 3.0 Specification for
+details regarding OSC usage. Currently, lots of firmwares don't provide
+_OSC support while they use PCI Express. To support such firmwares,
+forceload, a parameter of type bool, could enable AER to continue to
+be initiated although firmwares have no _OSC support. To enable the
+walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
+when booting kernel. Note that forceload=n by default.
+2.3 AER error output
+When a PCI-E AER error is captured, an error message will be outputed to
+console. If it's a correctable error, it is outputed as a warning.
+Otherwise, it is printed as an error. So users could choose different
+log level to filter out correctable error messages.
+Below shows an example.
+------ PCI-Express Device Error -----+
+Error Severity          : Uncorrected (Fatal)
+PCIE Bus Error type     : Transaction Layer
+Unsupported Request     : First
+Requester ID            : 0500
+VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h
+TLB Header:
+04000001 00200a03 05010000 00050100
+In the example, 'Requester ID' means the ID of the device who sends
+the error message to root port. Pls. refer to pci express specs for
+other fields.
+3. Developer Guide
+To enable AER aware support requires a software driver to configure
+the AER capability structure within its device and to provide callbacks.
+To support AER better, developers need understand how AER does work
+firstly.
+PCI Express errors are classified into two types: correctable errors
+and uncorrectable errors. This classification is based on the impacts
+of those errors, which may result in degraded performance or function
+failure.
+Correctable errors pose no impacts on the functionality of the
+interface. The PCI Express protocol can recover without any software
+intervention or any loss of data. These errors are detected and
+corrected by hardware. Unlike correctable errors, uncorrectable
+errors impact functionality of the interface. Uncorrectable errors
+can cause a particular transaction or a particular PCI Express link
+to be unreliable. Depending on those error conditions, uncorrectable
+errors are further classified into non-fatal errors and fatal errors.
+Non-fatal errors cause the particular transaction to be unreliable,
+but the PCI Express link itself is fully functional. Fatal errors, on
+the other hand, cause the link to be unreliable.
+When AER is enabled, a PCI Express device will automatically send an
+error message to the PCIE root port above it when the device captures
+an error. The Root Port, upon receiving an error reporting message,
+internally processes and logs the error message in its PCI Express
+capability structure. Error information being logged includes storing
+the error reporting agent's requestor ID into the Error Source
+Identification Registers and setting the error bits of the Root Error
+Status Register accordingly. If AER error reporting is enabled in Root
+Error Command Register, the Root Port generates an interrupt if an
+error is detected.
+Note that the errors as described above are related to the PCI Express
+hierarchy and links. These errors do not include any device specific
+errors because device specific errors will still get sent directly to
+the device driver.
+3.1 Configure the AER capability structure
+AER aware drivers of PCI Express component need change the device
+control registers to enable AER. They also could change AER registers,
+including mask and severity registers. Helper function
+pci_enable_pcie_error_reporting could be used to enable AER. See
+section 3.3.
+3.2. Provide callbacks
+3.2.1 callback reset_link to reset pci express link
+This callback is used to reset the pci express physical link when a
+fatal error happens. The root port aer service driver provides a
+default reset_link function, but different upstream ports might
+have different specifications to reset pci express link, so all
+upstream ports should provide their own reset_link functions.
+In struct pcie_port_service_driver, a new pointer, reset_link, is
+added.
+pci_ers_result_t (*reset_link) (struct pci_dev *dev);
+Section 3.2.2.2 provides more detailed info on when to call
+reset_link.
+3.2.2 PCI error-recovery callbacks
+The PCI Express AER Root driver uses error callbacks to coordinate
+with downstream device drivers associated with a hierarchy in question
+when performing error recovery actions.
+Data struct pci_driver has a pointer, err_handler, to point to
+pci_error_handlers who consists of a couple of callback function
+pointers. AER driver follows the rules defined in
+pci-error-recovery.txt except pci express specific parts (e.g.
+reset_link). Pls. refer to pci-error-recovery.txt for detailed
+definitions of the callbacks.
+Below sections specify when to call the error callback functions.
+3.2.2.1 Correctable errors
+Correctable errors pose no impacts on the functionality of
+the interface. The PCI Express protocol can recover without any
+software intervention or any loss of data. These errors do not
+require any recovery actions. The AER driver clears the device's
+correctable error status register accordingly and logs these errors.
+3.2.2.2 Non-correctable (non-fatal and fatal) errors
+If an error message indicates a non-fatal error, performing link reset
+at upstream is not required. The AER driver calls error_detected(dev,
+pci_channel_io_normal) to all drivers associated within a hierarchy in
+question. for example,
+EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
+If Upstream port A captures an AER error, the hierarchy consists of
+Downstream port B and EndPoint.
+A driver may return PCI_ERS_RESULT_CAN_RECOVER,
+PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on
+whether it can recover or the AER driver calls mmio_enabled as next.
+If an error message indicates a fatal error, kernel will broadcast
+error_detected(dev, pci_channel_io_frozen) to all drivers within
+a hierarchy in question. Then, performing link reset at upstream is
+necessary. As different kinds of devices might use different approaches
+to reset link, AER port service driver is required to provide the
+function to reset link. Firstly, kernel looks for if the upstream
+component has an aer driver. If it has, kernel uses the reset_link
+callback of the aer driver. If the upstream component has no aer driver
+and the port is downstream port, we will use the aer driver of the
+root port who reports the AER error. As for upstream ports,
+they should provide their own aer service drivers with reset_link
+function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
+reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
+to mmio_enabled.
+3.3 helper functions
+3.3.1 int pci_find_aer_capability(struct pci_dev *dev);
+pci_find_aer_capability locates the PCI Express AER capability
+in the device configuration space. If the device doesn't support
+PCI-Express AER, the function returns 0.
+3.3.2 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
+pci_enable_pcie_error_reporting enables the device to send error
+messages to root port when an error is detected. Note that devices
+don't enable the error reporting by default, so device drivers need
+call this function to enable it.
+3.3.3 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
+pci_disable_pcie_error_reporting disables the device to send error
+messages to root port when an error is detected.
+3.3.4 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
+pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
+error status register.
+3.4 Frequent Asked Questions
+Q: What happens if a PCI Express device driver does not provide an
+error recovery handler (pci_driver->err_handler is equal to NULL)?
+A: The devices attached with the driver won't be recovered. If the
+error is fatal, kernel will print out warning messages. Please refer
+to section 3 for more information.
+Q: What happens if an upstream port service driver does not provide
+callback reset_link?
+A: Fatal error recovery will fail if the errors are reported by the
+upstream ports who are attached by the service driver.
+Q: How does this infrastructure deal with driver that is not PCI
+Express aware?
+A: This infrastructure calls the error callback functions of the
+driver when an error happens. But if the driver is not aware of
+PCI Express, the device might not report its own errors to root
+port.
+Q: What modifications will that driver need to make it compatible
+with the PCI Express AER Root driver?
+A: It could call the helper functions to enable AER in devices and
+cleanup uncorrectable status register. Pls. refer to section 3.3.
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index fba1e05c47c7..d0e79d5820a5 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -1,208 +1,553 @@
+Most of the code in Linux is device drivers, so most of the Linux power
+management code is also driver-specific.  Most drivers will do very little;
+others, especially for platforms with small batteries (like cell phones),
+will do a lot.
+This writeup gives an overview of how drivers interact with system-wide
+power management goals, emphasizing the models and interfaces that are
+shared by everything that hooks up to the driver model core.  Read it as
+background for the domain-specific work you'd do with any specific driver.
+Two Models for Device Power Management
+======================================
+Drivers will use one or both of these models to put devices into low-power
+states:
+    System Sleep model:
+        Drivers can enter low power states as part of entering system-wide
+        low-power states like "suspend-to-ram", or (mostly for systems with
+        disks) "hibernate" (suspend-to-disk).
+        This is something that device, bus, and class drivers collaborate on
+        by implementing various role-specific suspend and resume methods to
+        cleanly power down hardware and software subsystems, then reactivate
+        them without loss of data.
+        Some drivers can manage hardware wakeup events, which make the system
+        leave that low-power state.  This feature may be disabled using the
+        relevant /sys/devices/.../power/wakeup file; enabling it may cost some
+        power usage, but let the whole system enter low power states more often.
+    Runtime Power Management model:
+        Drivers may also enter low power states while the system is running,
+        independently of other power management activity.  Upstream drivers
+        will normally not know (or care) if the device is in some low power
+        state when issuing requests; the driver will auto-resume anything
+        that's needed when it gets a request.
+        This doesn't have, or need much infrastructure; it's just something you
+        should do when writing your drivers.  For example, clk_disable() unused
+        clocks as part of minimizing power drain for currently-unused hardware.
+        Of course, sometimes clusters of drivers will collaborate with each
+        other, which could involve task-specific power management.
+There's not a lot to be said about those low power states except that they
+are very system-specific, and often device-specific.  Also, that if enough
+drivers put themselves into low power states (at "runtime"), the effect may be
+the same as entering some system-wide low-power state (system sleep) ... and
+that synergies exist, so that several drivers using runtime pm might put the
+system into a state where even deeper power saving options are available.
+Most suspended devices will have quiesced all I/O:  no more DMA or irqs, no
+more data read or written, and requests from upstream drivers are no longer
+accepted.  A given bus or platform may have different requirements though.
+Examples of hardware wakeup events include an alarm from a real time clock,
+network wake-on-LAN packets, keyboard or mouse activity, and media insertion
+or removal (for PCMCIA, MMC/SD, USB, and so on).
+Interfaces for Entering System Sleep States
+===========================================
+Most of the programming interfaces a device driver needs to know about
+relate to that first model:  entering a system-wide low power state,
+rather than just minimizing power consumption by one device.
+Bus Driver Methods
+------------------
+The core methods to suspend and resume devices reside in struct bus_type.
+These are mostly of interest to people writing infrastructure for busses
+like PCI or USB, or because they define the primitives that device drivers
+may need to apply in domain-specific ways to their devices:
-Device Power Management
+struct bus_type {
+        ...
+        int  (*suspend)(struct device *dev, pm_message_t state);
+        int  (*suspend_late)(struct device *dev, pm_message_t state);
+        int  (*resume_early)(struct device *dev);
+        int  (*resume)(struct device *dev);
+};
-Device power management encompasses two areas - the ability to save
+Bus drivers implement those methods as appropriate for the hardware and
-state and transition a device to a low-power state when the system is
+the drivers using it; PCI works differently from USB, and so on.  Not many
-entering a low-power state; and the ability to transition a device to
+people write bus drivers; most driver code is a "device driver" that
-a low-power state while the system is running (and independently of
+builds on top of bus-specific framework code.
-any other power management activity). 
+For more information on these driver calls, see the description later;
+they are called in phases for every device, respecting the parent-child
+sequencing in the driver model tree.  Note that as this is being written,
+only the suspend() and resume() are widely available; not many bus drivers
+leverage all of those phases, or pass them down to lower driver levels.
+/sys/devices/.../power/wakeup files
+-----------------------------------
+All devices in the driver model have two flags to control handling of
+wakeup events, which are hardware signals that can force the device and/or
+system out of a low power state.  These are initialized by bus or device
+driver code using device_init_wakeup(dev,can_wakeup).
+The "can_wakeup" flag just records whether the device (and its driver) can
+physically support wakeup events.  When that flag is clear, the sysfs
+"wakeup" file is empty, and device_may_wakeup() returns false.
+For devices that can issue wakeup events, a separate flag controls whether
+that device should try to use its wakeup mechanism.  The initial value of
+device_may_wakeup() will be true, so that the device's "wakeup" file holds
+the value "enabled".  Userspace can change that to "disabled" so that
+device_may_wakeup() returns false; or change it back to "enabled" (so that
+it returns true again).
+EXAMPLE:  PCI Device Driver Methods
+-----------------------------------
+PCI framework software calls these methods when the PCI device driver bound
+to a device device has provided them:
+struct pci_driver {
+        ...
+        int  (*suspend)(struct pci_device *pdev, pm_message_t state);
+        int  (*suspend_late)(struct pci_device *pdev, pm_message_t state);
+        int  (*resume_early)(struct pci_device *pdev);
+        int  (*resume)(struct pci_device *pdev);
+};
+Drivers will implement those methods, and call PCI-specific procedures
+like pci_set_power_state(), pci_enable_wake(), pci_save_state(), and
+pci_restore_state() to manage PCI-specific mechanisms.  (PCI config space
+could be saved during driver probe, if it weren't for the fact that some
+systems rely on userspace tweaking using setpci.)  Devices are suspended
+before their bridges enter low power states, and likewise bridges resume
+before their devices.
+Upper Layers of Driver Stacks
+-----------------------------
+Device drivers generally have at least two interfaces, and the methods
+sketched above are the ones which apply to the lower level (nearer PCI, USB,
+or other bus hardware).  The network and block layers are examples of upper
+level interfaces, as is a character device talking to userspace.
+Power management requests normally need to flow through those upper levels,
+which often use domain-oriented requests like "blank that screen".  In
+some cases those upper levels will have power management intelligence that
+relates to end-user activity, or other devices that work in cooperation.
+When those interfaces are structured using class interfaces, there is a
+standard way to have the upper layer stop issuing requests to a given
+class device (and restart later):
+struct class {
+        ...
+        int  (*suspend)(struct device *dev, pm_message_t state);
+        int  (*resume)(struct device *dev);
+};
-Methods
+Those calls are issued in specific phases of the process by which the
+system enters a low power "suspend" state, or resumes from it.
+Calling Drivers to Enter System Sleep States
+============================================
+When the system enters a low power state, each device's driver is asked
+to suspend the device by putting it into state compatible with the target
+system state.  That's usually some version of "off", but the details are
+system-specific.  Also, wakeup-enabled devices will usually stay partly
+functional in order to wake the system.
+When the system leaves that low power state, the device's driver is asked
+to resume it.  The suspend and resume operations always go together, and
+both are multi-phase operations.
+For simple drivers, suspend might quiesce the device using the class code
+and then turn its hardware as "off" as possible with late_suspend.  The
+matching resume calls would then completely reinitialize the hardware
+before reactivating its class I/O queues.
+More power-aware drivers drivers will use more than one device low power
+state, either at runtime or during system sleep states, and might trigger
+system wakeup events.
+Call Sequence Guarantees
+------------------------
+To ensure that bridges and similar links needed to talk to a device are
+available when the device is suspended or resumed, the device tree is
+walked in a bottom-up order to suspend devices.  A top-down order is
+used to resume those devices.
+The ordering of the device tree is defined by the order in which devices
+get registered:  a child can never be registered, probed or resumed before
+its parent; and can't be removed or suspended after that parent.
+The policy is that the device tree should match hardware bus topology.
+(Or at least the control bus, for devices which use multiple busses.)
+Suspending Devices
+------------------
+Suspending a given device is done in several phases.  Suspending the
+system always includes every phase, executing calls for every device
+before the next phase begins.  Not all busses or classes support all
+these callbacks; and not all drivers use all the callbacks.
+The phases are seen by driver notifications issued in this order:
+   1    class.suspend(dev, message) is called after tasks are frozen, for
+        devices associated with a class that has such a method.  This
+        method may sleep.
+        Since I/O activity usually comes from such higher layers, this is
+        a good place to quiesce all drivers of a given type (and keep such
+        code out of those drivers).
+   2    bus.suspend(dev, message) is called next.  This method may sleep,
+        and is often morphed into a device driver call with bus-specific
+        parameters and/or rules.
+        This call should handle parts of device suspend logic that require
+        sleeping.  It probably does work to quiesce the device which hasn't
+        been abstracted into class.suspend() or bus.suspend_late().
+   3    bus.suspend_late(dev, message) is called with IRQs disabled, and
+        with only one CPU active.  Until the bus.resume_early() phase
+        completes (see later), IRQs are not enabled again.  This method
+        won't be exposed by all busses; for message based busses like USB,
+        I2C, or SPI, device interactions normally require IRQs.  This bus
+        call may be morphed into a driver call with bus-specific parameters.
+        This call might save low level hardware state that might otherwise
+        be lost in the upcoming low power state, and actually put the
+        device into a low power state ... so that in some cases the device
+        may stay partly usable until this late.  This "late" call may also
+        help when coping with hardware that behaves badly.
+The pm_message_t parameter is currently used to refine those semantics
+(described later).
+At the end of those phases, drivers should normally have stopped all I/O
+transactions (DMA, IRQs), saved enough state that they can re-initialize
+or restore previous state (as needed by the hardware), and placed the
+device into a low-power state.  On many platforms they will also use
+clk_disable() to gate off one or more clock sources; sometimes they will
+also switch off power supplies, or reduce voltages.  Drivers which have
+runtime PM support may already have performed some or all of the steps
+needed to prepare for the upcoming system sleep state.
+When any driver sees that its device_can_wakeup(dev), it should make sure
+to use the relevant hardware signals to trigger a system wakeup event.
+For example, enable_irq_wake() might identify GPIO signals hooked up to
+a switch or other external hardware, and pci_enable_wake() does something
+similar for PCI's PME# signal.
+If a driver (or bus, or class) fails it suspend method, the system won't
+enter the desired low power state; it will resume all the devices it's
+suspended so far.
+Note that drivers may need to perform different actions based on the target
+system lowpower/sleep state.  At this writing, there are only platform
+specific APIs through which drivers could determine those target states.
+Device Low Power (suspend) States
+---------------------------------
+Device low-power states aren't very standard.  One device might only handle
+"on" and "off, while another might support a dozen different versions of
+"on" (how many engines are active?), plus a state that gets back to "on"
+faster than from a full "off".
+Some busses define rules about what different suspend states mean.  PCI
+gives one example:  after the suspend sequence completes, a non-legacy
+PCI device may not perform DMA or issue IRQs, and any wakeup events it
+issues would be issued through the PME# bus signal.  Plus, there are
+several PCI-standard device states, some of which are optional.
+In contrast, integrated system-on-chip processors often use irqs as the
+wakeup event sources (so drivers would call enable_irq_wake) and might
+be able to treat DMA completion as a wakeup event (sometimes DMA can stay
+active too, it'd only be the CPU and some peripherals that sleep).
+Some details here may be platform-specific.  Systems may have devices that
+can be fully active in certain sleep states, such as an LCD display that's
+refreshed using DMA while most of the system is sleeping lightly ... and
+its frame buffer might even be updated by a DSP or other non-Linux CPU while
+the Linux control processor stays idle.
+Moreover, the specific actions taken may depend on the target system state.
+One target system state might allow a given device to be very operational;
+another might require a hard shut down with re-initialization on resume.
+And two different target systems might use the same device in different
+ways; the aforementioned LCD might be active in one product's "standby",
+but a different product using the same SOC might work differently.
+Meaning of pm_message_t.event
+-----------------------------
+Parameters to suspend calls include the device affected and a message of
+type pm_message_t, which has one field:  the event.  If driver does not
+recognize the event code, suspend calls may abort the request and return
+a negative errno.  However, most drivers will be fine if they implement
+PM_EVENT_SUSPEND semantics for all messages.
+The event codes are used to refine the goal of suspending the device, and
+mostly matter when creating or resuming system memory image snapshots, as
+used with suspend-to-disk:
+    PM_EVENT_SUSPEND -- quiesce the driver and put hardware into a low-power
+        state.  When used with system sleep states like "suspend-to-RAM" or
+        "standby", the upcoming resume() call will often be able to rely on
+        state kept in hardware, or issue system wakeup events.  When used
+        instead with suspend-to-disk, few devices support this capability;
+        most are completely powered off.
+    PM_EVENT_FREEZE -- quiesce the driver, but don't necessarily change into
+        any low power mode.  A system snapshot is about to be taken, often
+        followed by a call to the driver's resume() method.  Neither wakeup
+        events nor DMA are allowed.
+    PM_EVENT_PRETHAW -- quiesce the driver, knowing that the upcoming resume()
+        will restore a suspend-to-disk snapshot from a different kernel image.
+        Drivers that are smart enough to look at their hardware state during
+        resume() processing need that state to be correct ... a PRETHAW could
+        be used to invalidate that state (by resetting the device), like a
+        shutdown() invocation would before a kexec() or system halt.  Other
+        drivers might handle this the same way as PM_EVENT_FREEZE.  Neither
+        wakeup events nor DMA are allowed.
+To enter "standby" (ACPI S1) or "Suspend to RAM" (STR, ACPI S3) states, or
+the similarly named APM states, only PM_EVENT_SUSPEND is used; for "Suspend
+to Disk" (STD, hibernate, ACPI S4), all of those event codes are used.
+There's also PM_EVENT_ON, a value which never appears as a suspend event
+but is sometimes used to record the "not suspended" device state.
+Resuming Devices
+----------------
+Resuming is done in multiple phases, much like suspending, with all
+devices processing each phase's calls before the next phase begins.
+The phases are seen by driver notifications issued in this order:
+   1    bus.resume_early(dev) is called with IRQs disabled, and with
+        only one CPU active.  As with bus.suspend_late(), this method
+        won't be supported on busses that require IRQs in order to
+        interact with devices.
+        This reverses the effects of bus.suspend_late().
+   2    bus.resume(dev) is called next.  This may be morphed into a device
+        driver call with bus-specific parameters; implementations may sleep.
+        This reverses the effects of bus.suspend().
+   3    class.resume(dev) is called for devices associated with a class
+        that has such a method.  Implementations may sleep.
+        This reverses the effects of class.suspend(), and would usually
+        reactivate the device's I/O queue.
+At the end of those phases, drivers should normally be as functional as
+they were before suspending:  I/O can be performed using DMA and IRQs, and
+the relevant clocks are gated on.  The device need not be "fully on"; it
+might be in a runtime lowpower/suspend state that acts as if it were.
+However, the details here may again be platform-specific.  For example,
+some systems support multiple "run" states, and the mode in effect at
+the end of resume() might not be the one which preceded suspension.
+That means availability of certain clocks or power supplies changed,
+which could easily affect how a driver works.
+Drivers need to be able to handle hardware which has been reset since the
+suspend methods were called, for example by complete reinitialization.
+This may be the hardest part, and the one most protected by NDA'd documents
+and chip errata.  It's simplest if the hardware state hasn't changed since
+the suspend() was called, but that can't always be guaranteed.
+Drivers must also be prepared to notice that the device has been removed
+while the system was powered off, whenever that's physically possible.
+PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
+where common Linux platforms will see such removal.  Details of how drivers
+will notice and handle such removals are currently bus-specific, and often
+involve a separate thread.
-The methods to suspend and resume devices reside in struct bus_type: 
-struct bus_type {
+Note that the bus-specific runtime PM wakeup mechanism can exist, and might
-       ...
+be defined to share some of the same driver code as for system wakeup.  For
-       int             (*suspend)(struct device * dev, pm_message_t state);
+example, a bus-specific device driver's resume() method might be used there,
-       int             (*resume)(struct device * dev);
+so it wouldn't only be called from bus.resume() during system-wide wakeup.
-};
+See bus-specific information about how runtime wakeup events are handled.
-Each bus driver is responsible implementing these methods, translating
-the call into a bus-specific request and forwarding the call to the
-bus-specific drivers. For example, PCI drivers implement suspend() and
-resume() methods in struct pci_driver. The PCI core is simply
-responsible for translating the pointers to PCI-specific ones and
-calling the low-level driver.
-This is done to a) ease transition to the new power management methods
-and leverage the existing PM code in various bus drivers; b) allow
-buses to implement generic and default PM routines for devices, and c)
-make the flow of execution obvious to the reader. 
-System Power Management
-When the system enters a low-power state, the device tree is walked in
-a depth-first fashion to transition each device into a low-power
-state. The ordering of the device tree is guaranteed by the order in
-which devices get registered - children are never registered before
-their ancestors, and devices are placed at the back of the list when
-registered. By walking the list in reverse order, we are guaranteed to
-suspend devices in the proper order. 
-Devices are suspended once with interrupts enabled. Drivers are
-expected to stop I/O transactions, save device state, and place the
-device into a low-power state. Drivers may sleep, allocate memory,
-etc. at will. 
-Some devices are broken and will inevitably have problems powering
-down or disabling themselves with interrupts enabled. For these
-special cases, they may return -EAGAIN. This will put the device on a
-list to be taken care of later. When interrupts are disabled, before
-we enter the low-power state, their drivers are called again to put
-their device to sleep. 
-On resume, the devices that returned -EAGAIN will be called to power
-themselves back on with interrupts disabled. Once interrupts have been
-re-enabled, the rest of the drivers will be called to resume their
-devices. On resume, a driver is responsible for powering back on each
-device, restoring state, and re-enabling I/O transactions for that
-device. 
+System Devices
+--------------
 System devices follow a slightly different API, which can be found in
        include/linux/sysdev.h
        drivers/base/sys.c
-System devices will only be suspended with interrupts disabled, and
+System devices will only be suspended with interrupts disabled, and after
-after all other devices have been suspended. On resume, they will be
+all other devices have been suspended.  On resume, they will be resumed
-resumed before any other devices, and also with interrupts disabled.
+before any other devices, and also with interrupts disabled.
+That is, IRQs are disabled, the suspend_late() phase begins, then the
+sysdev_driver.suspend() phase, and the system enters a sleep state.  Then
+the sysdev_driver.resume() phase begins, followed by the resume_early()
+phase, after which IRQs are enabled.
-Runtime Power Management
+Code to actually enter and exit the system-wide low power state sometimes
+involves hardware details that are only known to the boot firmware, and
-Many devices are able to dynamically power down while the system is
+may leave a CPU running software (from SRAM or flash memory) that monitors
-still running. This feature is useful for devices that are not being
+the system and manages its wakeup sequence.
-used, and can offer significant power savings on a running system. 
-In each device's directory, there is a 'power' directory, which
-contains at least a 'state' file. Reading from this file displays what
-power state the device is currently in. Writing to this file initiates
-a transition to the specified power state, which must be a decimal in
-the range 1-3, inclusive; or 0 for 'On'.
-The PM core will call the ->suspend() method in the bus_type object
-that the device belongs to if the specified state is not 0, or
->resume() if it is. 
-Nothing will happen if the specified state is the same state the
+Runtime Power Management
-device is currently in. 
+========================
+Many devices are able to dynamically power down while the system is still
-If the device is already in a low-power state, and the specified state
+running. This feature is useful for devices that are not being used, and
-is another, but different, low-power state, the ->resume() method will
+can offer significant power savings on a running system.  These devices
-first be called to power the device back on, then ->suspend() will be
+often support a range of runtime power states, which might use names such
-called again with the new state. 
+as "off", "sleep", "idle", "active", and so on.  Those states will in some
+cases (like PCI) be partially constrained by a bus the device uses, and will
-The driver is responsible for saving the working state of the device
+usually include hardware states that are also used in system sleep states.
-and putting it into the low-power state specified. If this was
-successful, it returns 0, and the device's power_state field is
+However, note that if a driver puts a device into a runtime low power state
-updated. 
+and the system then goes into a system-wide sleep state, it normally ought
+to resume into that runtime low power state rather than "full on".  Such
-The driver must take care to know whether or not it is able to
+distinctions would be part of the driver-internal state machine for that
-properly resume the device, including all step of reinitialization
+hardware; the whole point of runtime power management is to be sure that
-necessary. (This is the hardest part, and the one most protected by
+drivers are decoupled in that way from the state machine governing phases
-NDA'd documents). 
+of the system-wide power/sleep state transitions.
-The driver must also take care not to suspend a device that is
-currently in use. It is their responsibility to provide their own
+Power Saving Techniques
-exclusion mechanisms.
+-----------------------
+Normally runtime power management is handled by the drivers without specific
-The runtime power transition happens with interrupts enabled. If a
+userspace or kernel intervention, by device-aware use of techniques like:
-device cannot support being powered down with interrupts, it may
-return -EAGAIN (as it would during a system power management
+    Using information provided by other system layers
-transition),  but it will _not_ be called again, and the transaction
+        - stay deeply "off" except between open() and close()
-will fail.
+        - if transceiver/PHY indicates "nobody connected", stay "off"
+        - application protocols may include power commands or hints
-There is currently no way to know what states a device or driver
-supports a priori. This will change in the future. 
+    Using fewer CPU cycles
+        - using DMA instead of PIO
-pm_message_t meaning
+        - removing timers, or making them lower frequency
+        - shortening "hot" code paths
-pm_message_t has two fields. event ("major"), and flags.  If driver
+        - eliminating cache misses
-does not know event code, it aborts the request, returning error. Some
+        - (sometimes) offloading work to device firmware
-drivers may need to deal with special cases based on the actual type
-of suspend operation being done at the system level. This is why
+    Reducing other resource costs
-there are flags.
+        - gating off unused clocks in software (or hardware)
+        - switching off unused power supplies
-Event codes are:
+        - eliminating (or delaying/merging) IRQs
+        - tuning DMA to use word and/or burst modes
-ON -- no need to do anything except special cases like broken
-HW.
+    Using device-specific low power states
+        - using lower voltages
-# NOTIFICATION -- pretty much same as ON?
+        - avoiding needless DMA transfers
-FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
+Read your hardware documentation carefully to see the opportunities that
-scratch. That probably means stop accepting upstream requests, the
+may be available.  If you can, measure the actual power usage and check
-actual policy of what to do with them being specific to a given
+it against the budget established for your project.
-driver. It's acceptable for a network driver to just drop packets
-while a block driver is expected to block the queue so no request is
-lost. (Use IDE as an example on how to do that). FREEZE requires no
+Examples:  USB hosts, system timer, system CPU
-power state change, and it's expected for drivers to be able to
+----------------------------------------------
-quickly transition back to operating state.
+USB host controllers make interesting, if complex, examples.  In many cases
+these have no work to do:  no USB devices are connected, or all of them are
-SUSPEND -- like FREEZE, but also put hardware into low-power state. If
+in the USB "suspend" state.  Linux host controller drivers can then disable
-there's need to distinguish several levels of sleep, additional flag
+periodic DMA transfers that would otherwise be a constant power drain on the
-is probably best way to do that.
+memory subsystem, and enter a suspend state.  In power-aware controllers,
+entering that suspend state may disable the clock used with USB signaling,
-Transitions are only from a resumed state to a suspended state, never
+saving a certain amount of power.
-between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
-FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
+The controller will be woken from that state (with an IRQ) by changes to the
+signal state on the data lines of a given port, for example by an existing
-All events are:
+peripheral requesting "remote wakeup" or by plugging a new peripheral.  The
+same wakeup mechanism usually works from "standby" sleep states, and on some
-[NOTE NOTE NOTE: If you are driver author, you should not care; you
+systems also from "suspend to RAM" (or even "suspend to disk") states.
-should only look at event, and ignore flags.]
+(Except that ACPI may be involved instead of normal IRQs, on some hardware.)
-#Prepare for suspend -- userland is still running but we are going to
+System devices like timers and CPUs may have special roles in the platform
-#enter suspend state. This gives drivers chance to load firmware from
+power management scheme.  For example, system timers using a "dynamic tick"
-#disk and store it in memory, or do other activities taht require
+approach don't just save CPU cycles (by eliminating needless timer IRQs),
-#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
+but they may also open the door to using lower power CPU "idle" states that
-#are forbiden once the suspend dance is started.. event = ON, flags =
+cost more than a jiffie to enter and exit.  On x86 systems these are states
-#PREPARE_TO_SUSPEND
+like "C3"; note that periodic DMA transfers from a USB host controller will
+also prevent entry to a C3 state, much like a periodic timer IRQ.
-Apm standby -- prepare for APM event. Quiesce devices to make life
-easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
+That kind of runtime mechanism interaction is common.  "System On Chip" (SOC)
+processors often have low power idle modes that can't be entered unless
-Apm suspend -- same as APM_STANDBY, but it we should probably avoid
+certain medium-speed clocks (often 12 or 48 MHz) are gated off.  When the
-spinning down disks. event = FREEZE, flags = APM_SUSPEND
+drivers gate those clocks effectively, then the system idle task may be able
+to use the lower power idle modes and thereby increase battery life.
-System halt, reboot -- quiesce devices to make life easier for BIOS. event
-= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
+If the CPU can have a "cpufreq" driver, there also may be opportunities
+to shift to lower voltage settings and reduce the power cost of executing
-System shutdown -- at least disks need to be spun down, or data may be
+a given number of instructions.  (Without voltage adjustment, it's rare
-lost. Quiesce devices, just to make life easier for BIOS. event =
+for cpufreq to save much power; the cost-per-instruction must go down.)
-FREEZE, flags = SYSTEM_SHUTDOWN
-Kexec    -- turn off DMAs and put hardware into some state where new
+/sys/devices/.../power/state files
-kernel can take over. event = FREEZE, flags = KEXEC
+==================================
+For now you can also test some of this functionality using sysfs.
-Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
-may need to be enabled on some devices. This actually has at least 3
+        DEPRECATED:  USE "power/state" ONLY FOR DRIVER TESTING, AND
-subtypes, system can reboot, enter S4 and enter S5 at the end of
+        AVOID USING dev->power.power_state IN DRIVERS.
-swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
-SYSTEM_SHUTDOWN, SYSTEM_S4
+        THESE WILL BE REMOVED.  IF THE "power/state" FILE GETS REPLACED,
+        IT WILL BECOME SOMETHING COUPLED TO THE BUS OR DRIVER.
-Suspend to ram  -- put devices into low power state. event = SUSPEND,
-flags = SUSPEND_TO_RAM
+In each device's directory, there is a 'power' directory, which contains
+at least a 'state' file.  The value of this field is effectively boolean,
-Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
+PM_EVENT_ON or PM_EVENT_SUSPEND.
-devices into low power mode, but you must be able to reinitialize
-device from scratch in resume method. This has two flavors, its done
+   *    Reading from this file displays a value corresponding to
-once on suspending kernel, once on resuming kernel. event = FREEZE,
+        the power.power_state.event field.  All nonzero values are
-flags = DURING_SUSPEND or DURING_RESUME
+        displayed as "2", corresponding to a low power state; zero
+        is displayed as "0", corresponding to normal operation.
-Device detach requested from /sys -- deinitialize device; proably same as
-SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
+   *    Writing to this file initiates a transition using the
-= FREEZE, flags = DEV_DETACH.
+        specified event code number; only '0', '2', and '3' are
+        accepted (without a newline); '2' and '3' are both
-#These are not really events sent:
+        mapped to PM_EVENT_SUSPEND.
-#
-#System fully on -- device is working normally; this is probably never
+On writes, the PM core relies on that recorded event code and the device/bus
-#passed to suspend() method... event = ON, flags = 0
+capabilities to determine whether it uses a partial suspend() or resume()
-#
+sequence to change things so that the recorded event corresponds to the
-#Ready after resume -- userland is now running, again. Time to free any
+numeric parameter.
-#memory you ate during prepare to suspend... event = ON, flags =
-#READY_AFTER_RESUME
+   -    If the bus requires the irqs-disabled suspend_late()/resume_early()
-#
+        phases, writes fail because those operations are not supported here.
+   -    If the recorded value is the expected value, nothing is done.
+   -    If the recorded value is nonzero, the device is partially resumed,
+        using the bus.resume() and/or class.resume() methods.
+   -    If the target value is nonzero, the device is partially suspended,
+        using the class.suspend() and/or bus.suspend() methods and the
+        PM_EVENT_SUSPEND message.
+Drivers have no way to tell whether their suspend() and resume() calls
+have come through the sysfs power/state file or as part of entering a
+system sleep state, except that when accessed through sysfs the normal
+parent/child sequencing rules are ignored.  Drivers (such as bus, bridge,
+or hub drivers) which expose child devices may need to enforce those rules
+on their own.
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
index 4117802af0f8..a66bec222b16 100644
--- a/Documentation/power/interface.txt
+++ b/Documentation/power/interface.txt
@@ -52,3 +52,18 @@ suspend image will be as small as possible.
 Reading from this file will display the current image size limit, which
 is set to 500 MB by default.
+/sys/power/pm_trace controls the code which saves the last PM event point in
+the RTC across reboots, so that you can debug a machine that just hangs
+during suspend (or more commonly, during resume).  Namely, the RTC is only
+used to save the last PM event point if this file contains '1'.  Initially it
+contains '0' which may be changed to '1' by writing a string representing a
+nonzero integer into it.
+To use this debugging feature you should attempt to suspend the machine, then
+reboot it and run
+        dmesg -s 1000000 | grep 'hash matches'
+CAUTION: Using it will cause your machine's real-time (CMOS) clock to be
+set to a random invalid time after a resume.
diff --git a/Documentation/seclvl.txt b/Documentation/seclvl.txt
deleted file mode 100644
index 97274d122d0e..000000000000
--- a/Documentation/seclvl.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-BSD Secure Levels Linux Security Module
-Michael A. Halcrow <mike@halcrow.us>
-Introduction
-Under the BSD Secure Levels security model, sets of policies are
-associated with levels. Levels range from -1 to 2, with -1 being the
-weakest and 2 being the strongest. These security policies are
-enforced at the kernel level, so not even the superuser is able to
-disable or circumvent them. This hardens the machine against attackers
-who gain root access to the system.
-Levels and Policies
-Level -1 (Permanently Insecure):
- - Cannot increase the secure level
-Level 0 (Insecure):
- - Cannot ptrace the init process
-Level 1 (Default):
- - /dev/mem and /dev/kmem are read-only
- - IMMUTABLE and APPEND extended attributes, if set, may not be unset
- - Cannot load or unload kernel modules
- - Cannot write directly to a mounted block device
- - Cannot perform raw I/O operations
- - Cannot perform network administrative tasks
- - Cannot setuid any file
-Level 2 (Secure):
- - Cannot decrement the system time
- - Cannot write to any block device, whether mounted or not
- - Cannot unmount any mounted filesystems
-Compilation
-To compile the BSD Secure Levels LSM, seclvl.ko, enable the
-SECURITY_SECLVL configuration option.  This is found under Security
-options -> BSD Secure Levels in the kernel configuration menu.
-Basic Usage
-Once the machine is in a running state, with all the necessary modules
-loaded and all the filesystems mounted, you can load the seclvl.ko
-module:
-# insmod seclvl.ko
-The module defaults to secure level 1, except when compiled directly
-into the kernel, in which case it defaults to secure level 0. To raise
-the secure level to 2, the administrator writes ``2'' to the
-seclvl/seclvl file under the sysfs mount point (assumed to be /sys in
-these examples):
-# echo -n "2" > /sys/seclvl/seclvl
-Alternatively, you can initialize the module at secure level 2 with
-the initlvl module parameter:
-# insmod seclvl.ko initlvl=2
-At this point, it is impossible to remove the module or reduce the
-secure level.  If the administrator wishes to have the option of doing
-so, he must provide a module parameter, sha1_passwd, that specifies
-the SHA1 hash of the password that can be used to reduce the secure
-level to 0.
-To generate this SHA1 hash, the administrator can use OpenSSL:
-# echo -n "boogabooga" | openssl sha1
-abeda4e0f33defa51741217592bf595efb8d289c
-In order to use password-instigated secure level reduction, the SHA1
-crypto module must be loaded or compiled into the kernel:
-# insmod sha1.ko
-The administrator can then insmod the seclvl module, including the
-SHA1 hash of the password:
-# insmod seclvl.ko
-         sha1_passwd=abeda4e0f33defa51741217592bf595efb8d289c
-To reduce the secure level, write the password to seclvl/passwd under
-your sysfs mount point:
-# echo -n "boogabooga" > /sys/seclvl/passwd
-The September 2004 edition of Sys Admin Magazine has an article about
-the BSD Secure Levels LSM.  I encourage you to refer to that article
-for a more in-depth treatment of this security module:
-http://www.samag.com/documents/s=9304/sam0409a/0409a.htm
diff --git a/Documentation/sh/new-machine.txt b/Documentation/sh/new-machine.txt
index eb2dd2e6993b..73988e0d112b 100644
--- a/Documentation/sh/new-machine.txt
+++ b/Documentation/sh/new-machine.txt
@@ -41,11 +41,6 @@ Board-specific code:
        |
        .. more boards here ...
-It should also be noted that each board is required to have some certain
-headers. At the time of this writing, io.h is the only thing that needs
-to be provided for each board, and can generally just reference generic
-functions (with the exception of isa_port2addr).
 Next, for companion chips:
 .
 `-- arch
@@ -104,12 +99,13 @@ and then populate that with sub-directories for each member of the family.
 Both the Solution Engine and the hp6xx boards are an example of this.
 After you have setup your new arch/sh/boards/ directory, remember that you
-also must add a directory in include/asm-sh for headers localized to this
+should also add a directory in include/asm-sh for headers localized to this
-board. In order to interoperate seamlessly with the build system, it's best
+board (if there are going to be more than one). In order to interoperate
-to have this directory the same as the arch/sh/boards/ directory name,
+seamlessly with the build system, it's best to have this directory the same
-though if your board is again part of a family, the build system has ways
+as the arch/sh/boards/ directory name, though if your board is again part of
-of dealing with this, and you can feel free to name the directory after
+a family, the build system has ways of dealing with this (via incdir-y
-the family member itself.
+overloading), and you can feel free to name the directory after the family
+member itself.
 There are a few things that each board is required to have, both in the
 arch/sh/boards and the include/asm-sh/ heirarchy. In order to better
@@ -122,6 +118,7 @@ might look something like:
 * arch/sh/boards/vapor/setup.c - Setup code for imaginary board
 */
 #include <linux/init.h>
+#include <asm/rtc.h> /* for board_time_init() */
 const char *get_system_type(void)
 {
@@ -152,79 +149,57 @@ int __init platform_setup(void)
 }
 Our new imaginary board will also have to tie into the machvec in order for it
-to be of any use. Currently the machvec is slowly on its way out, but is still
+to be of any use.
-required for the time being. As such, let us take a look at what needs to be
-done for the machvec assignment.
 machvec functions fall into a number of categories:
 - I/O functions to IO memory (inb etc) and PCI/main memory (readb etc).
- - I/O remapping functions (ioremap etc)
+ - I/O mapping functions (ioport_map, ioport_unmap, etc).
- - some initialisation functions
+ - a 'heartbeat' function.
- - a 'heartbeat' function
+ - PCI and IRQ initialization routines.
- - some miscellaneous flags
+ - Consistent allocators (for boards that need special allocators,
+   particularly for allocating out of some board-specific SRAM for DMA
-The tree can be built in two ways:
+   handles).
- - as a fully generic build. All drivers are linked in, and all functions
-   go through the machvec
+There are machvec functions added and removed over time, so always be sure to
- - as a machine specific build. In this case only the required drivers
+consult include/asm-sh/machvec.h for the current state of the machvec.
-   will be linked in, and some macros may be redefined to not go through
-   the machvec where performance is important (in particular IO functions).
+The kernel will automatically wrap in generic routines for undefined function
+pointers in the machvec at boot time, as machvec functions are referenced
-There are three ways in which IO can be performed:
+unconditionally throughout most of the tree. Some boards have incredibly
- - none at all. This is really only useful for the 'unknown' machine type,
+sparse machvecs (such as the dreamcast and sh03), whereas others must define
-   which us designed to run on a machine about which we know nothing, and
+virtually everything (rts7751r2d).
-   so all all IO instructions do nothing.
- - fully custom. In this case all IO functions go to a machine specific
+Adding a new machine is relatively trivial (using vapor as an example):
-   set of functions which can do what they like
- - a generic set of functions. These will cope with most situations,
+If the board-specific definitions are quite minimalistic, as is the case for
-   and rely on a single function, mv_port2addr, which is called through the
+the vast majority of boards, simply having a single board-specific header is
-   machine vector, and converts an IO address into a memory address, which
+sufficient.
-   can be read from/written to directly.
+ - add a new file include/asm-sh/vapor.h which contains prototypes for
-Thus adding a new machine involves the following steps (I will assume I am
-adding a machine called vapor):
- - add a new file include/asm-sh/vapor/io.h which contains prototypes for
   any machine specific IO functions prefixed with the machine name, for
   example vapor_inb. These will be needed when filling out the machine
   vector.
-   This is the minimum that is required, however there are ample
+   Note that these prototypes are generated automatically by setting
-   opportunities to optimise this. In particular, by making the prototypes
+   __IO_PREFIX to something sensible. A typical example would be:
-   inline function definitions, it is possible to inline the function when
-   building machine specific versions. Note that the machine vector
+        #define __IO_PREFIX vapor
-   functions will still be needed, so that a module built for a generic
+        #include <asm/io_generic.h>
-   setup can be loaded.
+   somewhere in the board-specific header. Any boards being ported that still
- - add a new file arch/sh/boards/vapor/mach.c. This contains the definition
+   have a legacy io.h should remove it entirely and switch to the new model.
-   of the machine vector. When building the machine specific version, this
-   will be the real machine vector (via an alias), while in the generic
+ - Add machine vector definitions to the board's setup.c. At a bare minimum,
-   version is used to initialise the machine vector, and then freed, by
+   this must be defined as something like:
-   making it initdata. This should be defined as:
+        struct sh_machine_vector mv_vapor __initmv = {
-     struct sh_machine_vector mv_vapor __initmv = {
+                .mv_name = "vapor",
-       .mv_name = "vapor",
+        };
-     }
+        ALIAS_MV(vapor)
-     ALIAS_MV(vapor)
+ - finally add a file arch/sh/boards/vapor/io.c, which contains definitions of
- - finally add a file arch/sh/boards/vapor/io.c, which contains
+   the machine specific io functions (if there are enough to warrant it).
-   definitions of the machine specific io functions.
-A note about initialisation functions. Three initialisation functions are
-provided in the machine vector:
- - mv_arch_init - called very early on from setup_arch
- - mv_init_irq - called from init_IRQ, after the generic SH interrupt
-   initialisation
- - mv_init_pci - currently not used
-Any other remaining functions which need to be called at start up can be
-added to the list using the __initcalls macro (or module_init if the code
-can be built as a module). Many generic drivers probe to see if the device
-they are targeting is present, however this may not always be appropriate,
-so a flag can be added to the machine vector which will be set on those
-machines which have the hardware in question, reducing the probe to a
-single conditional.
 3. Hooking into the Build System
 ================================
@@ -303,4 +278,3 @@ which will in turn copy the defconfig for this board, run it through
 oldconfig (prompting you for any new options since the time of creation),
 and start you on your way to having a functional kernel for your new
 board.
diff --git a/Documentation/sh/register-banks.txt b/Documentation/sh/register-banks.txt
new file mode 100644
index 000000000000..a6719f2f6594
--- /dev/null
+++ b/Documentation/sh/register-banks.txt
@@ -0,0 +1,33 @@
+        Notes on register bank usage in the kernel
+        ==========================================
+Introduction
+------------
+The SH-3 and SH-4 CPU families traditionally include a single partial register
+bank (selected by SR.RB, only r0 ... r7 are banked), whereas other families
+may have more full-featured banking or simply no such capabilities at all.
+SR.RB banking
+-------------
+In the case of this type of banking, banked registers are mapped directly to
+r0 ... r7 if SR.RB is set to the bank we are interested in, otherwise ldc/stc
+can still be used to reference the banked registers (as r0_bank ... r7_bank)
+when in the context of another bank. The developer must keep the SR.RB value
+in mind when writing code that utilizes these banked registers, for obvious
+reasons. Userspace is also not able to poke at the bank1 values, so these can
+be used rather effectively as scratch registers by the kernel.
+Presently the kernel uses several of these registers.
+        - r0_bank, r1_bank (referenced as k0 and k1, used for scratch
+          registers when doing exception handling).
+        - r2_bank (used to track the EXPEVT/INTEVT code)
+                - Used by do_IRQ() and friends for doing irq mapping based off
+                  of the interrupt exception vector jump table offset
+        - r6_bank (global interrupt mask)
+                - The SR.IMASK interrupt handler makes use of this to set the
+                  interrupt priority level (used by local_irq_enable())
+        - r7_bank (current)
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 7cee90223d3a..20d0d797f539 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm:
 - drop-caches
 - zone_reclaim_mode
 - min_unmapped_ratio
+- min_slab_ratio
 - panic_on_oom
 ==============================================================
@@ -138,7 +139,6 @@ This is value ORed together of
 1       = Zone reclaim on
 2       = Zone reclaim writes dirty pages out
 4       = Zone reclaim swaps pages
-8       = Also do a global slab reclaim pass
 zone_reclaim_mode is set during bootup to 1 if it is determined that pages
 from remote zones will cause a measurable performance reduction. The
@@ -162,18 +162,13 @@ Allowing regular swap effectively restricts allocations to the local
 node unless explicitly overridden by memory policies or cpuset
 configurations.
-It may be advisable to allow slab reclaim if the system makes heavy
-use of files and builds up large slab caches. However, the slab
-shrink operation is global, may take a long time and free slabs
-in all nodes of the system.
 =============================================================
 min_unmapped_ratio:
 This is available only on NUMA kernels.
-A percentage of the file backed pages in each zone.  Zone reclaim will only
+A percentage of the total pages in each zone.  Zone reclaim will only
 occur if more than this percentage of pages are file backed and unmapped.
 This is to insure that a minimal amount of local pages is still available for
 file I/O even if the node is overallocated.
@@ -182,6 +177,24 @@ The default is 1 percent.
 =============================================================
+min_slab_ratio:
+This is available only on NUMA kernels.
+A percentage of the total pages in each zone.  On Zone reclaim
+(fallback from the local zone occurs) slabs will be reclaimed if more
+than this percentage of pages in a zone are reclaimable slab pages.
+This insures that the slab growth stays under control even in NUMA
+systems that rarely perform global reclaim.
+The default is 5 percent.
+Note that slab reclaim is triggered in a per zone / node fashion.
+The process of reclaiming slab memory is currently not node specific
+and may not be fast.
+=============================================================
 panic_on_oom
 This enables or disables panic on out-of-memory feature.  If this is set to 1,
diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt
index 867f4c38f356..39c68f8c4e6c 100644
--- a/Documentation/usb/error-codes.txt
+++ b/Documentation/usb/error-codes.txt
@@ -98,13 +98,13 @@ one or more packets could finish before an error stops further endpoint I/O.
                        error, a failure to respond (often caused by
                        device disconnect), or some other fault.
-ETIMEDOUT (**)         No response packet received within the prescribed
+-ETIME (**)             No response packet received within the prescribed
                        bus turn-around time.  This error may instead be
                        reported as -EPROTO or -EILSEQ.
-                        Note that the synchronous USB message functions
+-ETIMEDOUT              Synchronous USB message functions use this code
-                        also use this code to indicate timeout expired
+                        to indicate timeout expired before the transfer
-                        before the transfer completed.
+                        completed, and no other error was reported by HC.
 -EPIPE (**)             Endpoint stalled.  For non-control endpoints,
                        reset this status with usb_clear_halt().
@@ -163,6 +163,3 @@ usb_get_*/usb_set_*():
 usb_control_msg():
 usb_bulk_msg():
 -ETIMEDOUT              Timeout expired before the transfer completed.
-                        In the future this code may change to -ETIME,
-                        whose definition is a closer match to this sort
-                        of error.
diff --git a/Documentation/usb/usb-serial.txt b/Documentation/usb/usb-serial.txt
index 02b0f7beb6d1..a2dee6e6190d 100644
--- a/Documentation/usb/usb-serial.txt
+++ b/Documentation/usb/usb-serial.txt
@@ -433,6 +433,11 @@ Options supported:
  See http://www.uuhaus.de/linux/palmconnect.html for up-to-date
  information on this driver.
+AIRcable USB Dongle Bluetooth driver
+  If there is the cdc_acm driver loaded in the system, you will find that the
+  cdc_acm claims the device before AIRcable can. This is simply corrected
+  by unloading both modules and then loading the aircable module before
+  cdc_acm module
 Generic Serial driver
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 6da24e7a56cb..74b77f9e91bc 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -199,6 +199,11 @@ IOMMU
   allowed  overwrite iommu off workarounds for specific chipsets.
   soft  Use software bounce buffering (default for Intel machines)
   noaperture Don't touch the aperture for AGP.
+   allowdac Allow DMA >4GB
+            When off all DMA over >4GB is forced through an IOMMU or bounce
+            buffering.
+   nodac    Forbid DMA >4GB
+   panic    Always panic when IOMMU overflows
  swiotlb=pages[,force]
@@ -245,6 +250,13 @@ Debugging
                newfallback: use new unwinder but fall back to old if it gets
                        stuck (default)
+  call_trace=[old|both|newfallback|new]
+                old: use old inexact backtracer
+                new: use new exact dwarf2 unwinder
+                both: print entries from both
+                newfallback: use new unwinder but fall back to old if it gets
+                        stuck (default)
 Misc
  noreplacement  Don't replace instructions with more appropriate ones
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks
new file mode 100644
index 000000000000..bddfddd466ab
--- /dev/null
+++ b/Documentation/x86_64/kernel-stacks
@@ -0,0 +1,99 @@
+Most of the text from Keith Owens, hacked by AK
+x86_64 page size (PAGE_SIZE) is 4K.
+Like all other architectures, x86_64 has a kernel stack for every
+active thread.  These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big.
+These stacks contain useful data as long as a thread is alive or a
+zombie. While the thread is in user space the kernel stack is empty
+except for the thread_info structure at the bottom.
+In addition to the per thread stacks, there are specialized stacks
+associated with each cpu.  These stacks are only used while the kernel
+is in control on that cpu, when a cpu returns to user space the
+specialized stacks contain no useful data.  The main cpu stacks is
+* Interrupt stack.  IRQSTACKSIZE
+  Used for external hardware interrupts.  If this is the first external
+  hardware interrupt (i.e. not a nested hardware interrupt) then the
+  kernel switches from the current task to the interrupt stack.  Like
+  the split thread and interrupt stacks on i386 (with CONFIG_4KSTACKS),
+  this gives more room for kernel interrupt processing without having
+  to increase the size of every per thread stack.
+  The interrupt stack is also used when processing a softirq.
+Switching to the kernel interrupt stack is done by software based on a
+per CPU interrupt nest counter. This is needed because x86-64 "IST"
+hardware stacks cannot nest without races.
+x86_64 also has a feature which is not available on i386, the ability
+to automatically switch to a new stack for designated events such as
+double fault or NMI, which makes it easier to handle these unusual
+events on x86_64.  This feature is called the Interrupt Stack Table
+(IST).  There can be up to 7 IST entries per cpu. The IST code is an
+index into the Task State Segment (TSS), the IST entries in the TSS
+point to dedicated stacks, each stack can be a different size.
+An IST is selected by an non-zero value in the IST field of an
+interrupt-gate descriptor.  When an interrupt occurs and the hardware
+loads such a descriptor, the hardware automatically sets the new stack
+pointer based on the IST value, then invokes the interrupt handler.  If
+software wants to allow nested IST interrupts then the handler must
+adjust the IST values on entry to and exit from the interrupt handler.
+(this is occasionally done, e.g. for debug exceptions)
+Events with different IST codes (i.e. with different stacks) can be
+nested.  For example, a debug interrupt can safely be interrupted by an
+NMI.  arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack
+pointers on entry to and exit from all IST events, in theory allowing
+IST events with the same code to be nested.  However in most cases, the
+stack size allocated to an IST assumes no nesting for the same code.
+If that assumption is ever broken then the stacks will become corrupt.
+The currently assigned IST stacks are :-
+* STACKFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+  Used for interrupt 12 - Stack Fault Exception (#SS).
+  This allows to recover from invalid stack segments. Rarely
+  happens.
+* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+  Used for interrupt 8 - Double Fault Exception (#DF).
+  Invoked when handling a exception causes another exception. Happens
+  when the kernel is very confused (e.g. kernel stack pointer corrupt)
+  Using a separate stack allows to recover from it well enough in many
+  cases to still output an oops.
+* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+  Used for non-maskable interrupts (NMI).
+  NMI can be delivered at any time, including when the kernel is in the
+  middle of switching stacks.  Using IST for NMI events avoids making
+  assumptions about the previous state of the kernel stack.
+* DEBUG_STACK.  DEBUG_STKSZ
+  Used for hardware debug interrupts (interrupt 1) and for software
+  debug interrupts (INT3).
+  When debugging a kernel, debug interrupts (both hardware and
+  software) can occur at any time.  Using IST for these interrupts
+  avoids making assumptions about the previous state of the kernel
+  stack.
+* MCE_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
+  Used for interrupt 18 - Machine Check Exception (#MC).
+  MCE can be delivered at any time, including when the kernel is in the
+  middle of switching stacks.  Using IST for MCE events avoids making
+  assumptions about the previous state of the kernel stack.
+For more details see the Intel IA32 or AMD AMD64 architecture manuals.